Open jeff52415 opened 6 months ago
Hi @jeff52415 thanks for opening this issue, this is a really good question. One possible source of discrepancy is the different implementations of NF4 quantization used by torchtune and Hugging Face. To be more explicit, torchtune relies on the NF4Tensor class from torchao in QLoRA instead of the bitsandbytes version from Hugging Face. I need to verify that quantizing a torchtune checkpoint with bitsandbytes yields the same result as quantizing with ao. Let me look into it and get back to you. Also cc @rohan-varma who may have some insights here
Can I load QLoRA fine-tuning weights into a Hugging Face model as shown below?
I have changed the Checkpointer to FullModelHFCheckpointer. Essentially, it is loadable & runnable, but I am curious if it reflects the same structure as qlora_llama3_8b. Thanks.