Closed hwchong closed 4 years ago
I think the problem lies in the scatter_kwargs
function. scatter_kwargs in data_parallel. It seems that scatter_kwargs
only split the variables in the first dimension, however the hidden_state
input to RNNs requires format (num_layers * num_directions, batch, hidden_size)
, which contradicts with the rule of scatter_kwargs
. One temporary solution is to swap the dimension of hidden_state
by wrapping GRU
.
should be good by now
I've been trying to train recurrent neural network using nn.GRU on a multi-GPU setup and am randomly getting crashes caused by CuDNN.
I'm using PyTorch 0.2 on Linux running in an nvidia-docker container with the latest nvidia/cuda image.
This is the error message that comes up:
CuDNNError Traceback (most recent call last)