tloen / alpaca-lora

Instruct-tune LLaMA on consumer hardware
Apache License 2.0
18.68k stars 2.22k forks source link

Benchmarking and optimization tips #11

Open 0xbitches opened 1 year ago

0xbitches commented 1 year ago

Not exactly an issue, but have just been trying to run one epoch of finetuning with llama-13b. On a 4090 looks like it will take roughly 4 hours with the setting `MICRO_BATCH_SIZE = 2'.

However, it looks like the loss already converged to ~1 within epoch 0.12 (roughly 30 minutes into training), so it doesn't really make sense to use epoch=3 and potentially a larger micro batch size.

I could be wrong here. Happy to hear some feedback on how to better tune the parameters.

0xbitches commented 1 year ago

Also, would be great if we can have 4bit support by incorporating GPTQ #2

tloen commented 1 year ago

With 256 tokens the loss slowly pulls further down to somewhere slightly above 0.8. You could maybe get away with using 2 epochs instead of 3, though.

0xbitches commented 1 year ago

With 256 tokens the loss slowly pulls further down to somewhere slightly above 0.8. You could maybe get away with using 2 epochs instead of 3, though.

Yeah I definitely saw it drop below 0.75 somewhere between epoch 1-2. Could still achieve pretty good loss with just one epoch though. Was testing this in a hurry so just sharing this information here.

kesar commented 1 year ago

did you get below 0.75 w current hyperparams? I wasnt able to get under 0.8 . wondering what others are getting (Im using A100 40GB)

tloen commented 1 year ago

I probably wouldn't anchor too much on the specific loss numbers until we've refactored the training code to use validation sets.