Open styxjedi opened 4 years ago
I don't think those legacy datasets are compatible with torch.nn.parallel.DistributedParallel
. Those new datasets in torchtext
should be.
Yes, I think you're right.
But what if I create a new corpus by using torchtext.data
, how can I make it compatible with torch.nn.parallel.DistributedParallel
or torch.nn.utils.data.DataLoader
?
I need it too, shame
You need to write the dataset as a list (see the self.data
part in the new datasets in torchtext.experimental
). Then, it should be compatible with torch.nn.utils.data.DataLoader
.
❓ Questions and Help
Description
I built a dataset from my corpus, and use each line as an Example. It works fine at first until I try to use it for distributed training.
It seems that torch.nn.parallel.DistributedParallel has to use DistributedSampler, but it's not compatible with torchtext datasets.
Is there any idea to use torchtext datasets for distributed training? Thx!