Closed wjun0830 closed 1 year ago
Hi @wjun0830 ,
For pretraining, we run on 8GPUs, with less than 24 GB when bsz =32; 10 epoch typically requires 3-4 days; You can flexible decrease the bsz or transformer project dimension for less memory usage and higher efficiency
Thank you!
Hello Kevin!
Can you provide how much memory space and time is spent for doing PT?
Thanks