Closed wlhgtc closed 1 year ago
Hmm the error doesn't seem related to data loading.
Regarding split_dataset_by_node
: it's generally used to split an iterable dataset (e.g. when streaming) in pytorch DDP. It's not needed if you use a regular dataset since the pytorch DataLoader already assigns a subset of the dataset indices to each node.
Hi guys, recently I tried to use
datasets
to train a dual encoder. I finish my own datasets according to the nice tutorial Here are my code:It works well on single GPU, but got errors as follows when used DDP:
Here are my train script on a two A100 mechine:
I am not sure if this error below related to my dataset code when use DDP. And I notice the PR(#5369 ), but I don't know when and where should I used the function(
split_dataset_by_node
) .@lhoestq hope you could help me?