Open djstrong opened 6 years ago
I am experiencing the same problem, @djstrong did you able to solve it?
Found the fix:
dim=1
in your nn.DataParallel
constructor parameter as the data passed through your network will be in the form of [steps, batch_size, dims]
already, DataParallel
needs to know which dimension you want to split.nn.DataParallel
will use the modified dim
parameters to merge the final result. That line will flatten the tensor from [steps, batch, dims]
into [steps * batch, dims]
, if you define dim=1
, instead of merging the result in dim steps * batch
, it will merge the result with dims
dimension.Let me know if it doesn't work for you!
Thanks! I will try.
I am training to run the model on multiple GPUs. Probably
SplitCrossEntropyLoss
causes some troubles, any hints?