predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
https://loraexchange.ai
Apache License 2.0
2.06k stars 138 forks source link

Support loading `.pt` weights #420

Open shripadk opened 4 months ago

shripadk commented 4 months ago

Feature request

Need support for loading models that only contain .pt weights

Motivation

I quantized Mixtral 8x7b model using HQQ (which produces a qmodel.pt file). But I am unable to load the weights in LoRAX as it expects either a .safetensors or .bin weights.

Your contribution

I haven't studied the source enough to submit a PR but from cursory understanding of the code, changes need to be made in hub.py file, specifically: https://github.com/predibase/lorax/blob/cc2e0a90380c1342ea39cc483f3db8230cbf8d05/server/lorax_server/utils/sources/hub.py#L68-L78

Though I would also like to be able to load the base model from local rather than remote/from the hub (as explained in this issue: https://github.com/predibase/lorax/issues/347)

magdyksaleh commented 4 months ago

I will work on a fix for this alongside #347

tgaddair commented 3 months ago

Looks like we just need to support .pt extension as an alternative to .bin (it should be the same underlying format).

As a workaround @shripadk can you try renaming the file to qmodel.bin?