jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.38k stars 143 forks source link

Training Time #3

Open thisisisheanesu opened 7 months ago

thisisisheanesu commented 7 months ago

Would it be possible for you to add how long each training run takes to the README? I think a lot of people who have heard about Galore would be interested in that.

Explorergt92 commented 6 months ago

I had it running for a couple hours last night with allenai/c4 English only dataset, the time to complete estimate was showing ~2650 hours so ~110 days if the estimate holds true.

jiaweizzhao commented 6 months ago

Thanks for the suggestion and I will update the training time. @Explorergt92 for training 7B on a single 4090, I think "110 days around" is correct.