huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.68k stars 26.22k forks source link

save_pretrained 4-bit models with bitsandbytes #23904

Closed westn closed 1 year ago

westn commented 1 year ago

With the latest version of bitsandbytes (0.39.0) library, isn't it possible to serialize 4-bit models then?

Thus this section should be updated to allow the user to save these models. https://github.com/huggingface/transformers/blob/68d53bc7178866821282f45732c1e465f5160fa6/src/transformers/modeling_utils.py#LL1704C36-L1704C36

I'm mainly looking at this article to see what got produced lately in the area: https://huggingface.co/blog/4bit-transformers-bitsandbytes

I aim to quantize a model to 4-bit, enabling it to be used in e.g., GPT4all and other CPU platforms.

sgugger commented 1 year ago

cc @younesbelkada

younesbelkada commented 1 year ago

Hi @westn Thanks for the issue, I don't think 4bit models are serializable yet. Let me double check that with the author of bitsandbytes and get back to you

younesbelkada commented 1 year ago

Also note that 4bit / 8bit is also applicable to GPU / CUDA devices, you cannot run quantized models with bitsandbytes on a CPU device

younesbelkada commented 1 year ago

Hi @westn Currently it is not possible to save 4bit models but this is in the roadmap of bitsandbytes for the next releases. We will keep you posted!

jameswu2014 commented 1 year ago

Hi, @younesbelkada , Whether 4bits/8bits models can be saved now?

younesbelkada commented 1 year ago

Hi @jameswu2014 Thanks for the heads up, currently 4bit saving is not possible, however, 8bit saving is possible, check: https://huggingface.co/docs/transformers/main_classes/quantization#push-quantized-models-on-the-hub

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

younesbelkada commented 1 year ago

See linked bnb issue: https://github.com/TimDettmers/bitsandbytes/issues/695

nathan-az commented 1 year ago

Bumping this since this is a top Google result for this topic, and I haven't found an answer elsewhere - is there any way to "de-quantize" a model which was trained in 4-bit or 8-bit (without a peft adapter) so it can be saved and loaded in bfloat16 or some other type? (This is currently a blocker to training checkpointing, which also means things like early stopping are not possible).

I tried basic things like model.to(torch.blfoat16) but it throws an error stating that bitsandbytes models can't be cast with to.

kbulutozler commented 11 months ago

Is there any update on this? @nathan-az @younesbelkada

younesbelkada commented 11 months ago

hi @kbulutozler Yes, @poedator is working on it in https://github.com/huggingface/transformers/pull/26037 and the corresponding bnb PR is here: https://github.com/TimDettmers/bitsandbytes/pull/753