huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.18k stars 354 forks source link

fix: Zephyr LoRA fine-tuning fixed #139

Closed Serega6678 closed 3 months ago

Serega6678 commented 3 months ago

I tried using LoRA fine-tuning instead of QLoRA fine-tuning and it didn't work: I used exactly your training config and if I train LoRA, the loss would become 0 without bf16 specified in the config.

image

With bf16 issue is resolved (I trained LoRA model for 50% of the sft data and got the expected results). Furthermore, I re-used this bf16 config for QLoRA and the results are the same as you report (~0.95 SFT loss)

Also, I added the flash attention 2 flag as it speeds up training, allows doubling the batch for QLoRA (per gpu batch size 4 -> 8) while not changing the results at all (just to be safe, I tested it too and the curves are the same)

image

P.S. in my photos, 1% means that I trained for 1% of SFT steps just to make sure the losses are identical and changing the flag does not break anything

In total, this PR fixes LoRA that didn't work before (due to loss = 0) and speeds up QLoRA & LoRA configurations by flash attention flag

HuggingFaceDocBuilderDev commented 3 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.