pytorch / torchtune

A Native-PyTorch Library for LLM Fine-tuning
https://pytorch.org/torchtune/main/
BSD 3-Clause "New" or "Revised" License
4.06k stars 376 forks source link

Support loading of pre-quantized models #1041

Open rohan-varma opened 4 months ago

rohan-varma commented 4 months ago

For workloads such as QLoRA, we can save and upload (or use existing ones) pre-quantized model weights, which would have a couple of benefits:

This would of course come with the downside of reducing interop for these particular checkpoints (since offramp paths typically consume bf16 checkpoints), but the user would still have the option to save these in bf16 after training, reducing this concern.

rohan-varma commented 4 months ago

This also has the added benefit of peak memory decrease during model initialization since we don't need to allocate any bf16 tensors for the quantized portions of the model.