I have started to experiment with training this model and it seems that it can make use of a large batchsize, but even then the training times are quite long.
Are there any plans to make this project multi GPU using dataparallel or dataparallelprocessing
Hi
I have started to experiment with training this model and it seems that it can make use of a large batchsize, but even then the training times are quite long.
Are there any plans to make this project multi GPU using dataparallel or dataparallelprocessing
Sam