Open shuangqinbuaa opened 6 years ago
hey, the time per batch looks high to me, what kind of GPU are you using?
@miyyer how long it took for you to train scpn model on PARANMT-50M dataset ( 15gb )?
because for me , it has taken two days and only half of batch has been trained from epoch 0.
done with batch 402000 / 439586 in epoch 0, loss: 1.020283, time:308
-- std output after running 3 days
below is the my device setup.
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 000075E1:00:00.0 Off | 0 |
| N/A 67C P0 121W / 149W | 10586MiB / 11441MiB | 78% Default |
+-------------------------------+----------------------+----------------------+
Thanks.
I try to train the scpn model but the training data is too large. I use one GPU. The batch size is 64 and for every batch I need 1.6 seconds to train. But there is 439586 batches. I try to use two GPUs to train but I fail. Could you tell me how you speed up the training process? Thank you so much. @miyyer @jwieting