salesforce / simpletod

Official repository for "SimpleTOD: A Simple Language Model for Task-Oriented Dialogue"
https://arxiv.org/abs/2005.00796
BSD 3-Clause "New" or "Revised" License
235 stars 79 forks source link

Distributed Training #15

Open yuanzhaoz opened 3 years ago

yuanzhaoz commented 3 years ago

Hi, I found that using Dataparallel is really slow, thus I'm looking at the distributeddataparallel part of the code. However I'm not clear what is the default configuration in order to utilize distributeddataparallel, can someone help me on this? Thanks!

ShaneTian commented 3 years ago

Maybe it can help you https://suixinblog.cn/2020/08/pytorch-distributeddataparallel.html if you understand Chinese. But I have to tell you that I am not the only one who finds that the results in the paper are not repeatable ( #5 ), but the author has not made any explanation for this.

yuanzhaoz commented 3 years ago

Maybe it can help you https://suixinblog.cn/2020/08/pytorch-distributeddataparallel.html if you understand Chinese. But I have to tell you that I am not the only one who finds that the results in the paper are not repeatable ( #5 ), but the author has not made any explanation for this.

Hey Thanks Shane,

I did try get DDP working, however the speed is still slower than single-node single-gpu. May I know the batch size you are using?