Open gudiandian opened 4 years ago
I have pulled the code from branch train. Is there a way to train or fine tune the GPT-2 model with data parallelism on multiple GPUs? Thanks for your help.
I have pulled the code from branch train. Is there a way to train or fine tune the GPT-2 model with data parallelism on multiple GPUs? Thanks for your help.