Closed Hyaloid closed 11 months ago
Each stage uses SyntheticDataset((3, 224, 224), 1000000)
except the first stage. So the size of dataset is different.
Maybe you should annotate the two lines in main_with_runtime.py
to obtain the same datasets,
if not is_first_stage():
args.synthetic_data = True
maybe you should annotate the two lines to obtain the same datasets, or you should use -s
in the command line to use synthetic_data
to test whether pipedream works.
Hi, @deepakn94 , I was running
alexnet.gpus=4_straight
withmp_conf.json
. Here is what I got aboutlen(train_loader)
:train_loader
is defined inmain_with_runtime.py
:batch_size
is the same in each rank. Using the definition oftrain_loader
, each rank should have the samelen(train_loader)
I think, but actually they are different. Do you know why this happens?And training with different
len(train_loader)
, I got this error when all250000
iterations are finished:Perhaps this is caused by the reasons mentioned above? It would be so appreciated if you could help!