tatsu-lab / stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.
https://crfm.stanford.edu/2023/03/13/alpaca.html
Apache License 2.0
29.39k stars 4.03k forks source link

Finetune with A100 40G #280

Open jianchaoji opened 1 year ago

jianchaoji commented 1 year ago

Can we use A100 40G to finetune llama-7B? Is there anyone try that?

GasolSun36 commented 1 year ago

I try 8 A100 40g to finetune llama-7B with FSDP offload, it works fine for me.

jianchaoji commented 1 year ago

Thank you so much for the response! Did you try 4 A100 40G as well?

ffohturk commented 1 year ago

I tried 4 A100 40GB with FSDP offload, but had to reduce the eval and train batch size from 3 to 2 in order to avoid OOM. Took 58 hours.

hychaochao commented 8 months ago

I tried 4 A100 40GB with FSDP offload, but had to reduce the eval and train batch size from 3 to 2 in order to avoid OOM. Took 58 hours.

I tried the same configuration 4 A100 40G, but it still OOM. Can you publish your parameter settings? Thanks! @ffohturk