msr-fiddle / pipedream

MIT License
379 stars 117 forks source link

"no kernel image is available for execution on the device" when doing translation profiling #11

Closed gth828r closed 5 years ago

gth828r commented 5 years ago

I am running the translation profiler as follows:

# python train.py --dataset-dir /mnt/wmt16/ --target-bleu 21.8 --epochs 20 --math fp32 --print-freq 10 --arch gnmt --batch-size 64 --test-batch-size 128
 --model-config "{'num_layers': 4, 'hidden_size': 1024, 'dropout':0.2, 'share_embedding': False}" --optimization-config "{'optimizer': 'FusedAdam', 'lr': 1.75e-3}" --scheduler-config "{'lr_method':'mlperf
', 'warmup_iters':1000, 'remain_steps':1450, 'decay_steps':40}"

When running, I encounter the following error when the first training epoch starts:

0: Starting epoch 0                                                                                                                                                                                 [7/1801]
:::MLPv0.5.0 gnmt 1569879709.258431435 (train.py:452) train_epoch: 0
THCudaCheck FAIL file=/opt/pytorch/pytorch/aten/src/THC/generic/THCTensorMath.cu line=35 error=209 : no kernel image is available for execution on the device
Traceback (most recent call last):
  File "train.py", line 474, in <module>
    main()
  File "train.py", line 458, in main
    train_loss, train_perf = trainer.optimize(train_loader)
  File "/workspace/pipedream/profiler/translation/seq2seq/train/trainer.py", line 372, in optimize
    self.preallocate(data_loader, training=True)
  File "/workspace/pipedream/profiler/translation/seq2seq/train/trainer.py", line 360, in preallocate
    self.iterate(src, tgt, update=False, training=training)
  File "/workspace/pipedream/profiler/translation/seq2seq/train/trainer.py", line 160, in iterate
    output = self.model(src, src_length, tgt[:-1])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 507, in __call__
    result = self.forward(*input, **kwargs)
  File "/workspace/pipedream/profiler/translation/seq2seq/models/gnmt.py", line 62, in forward
    context = self.encode(input_encoder, input_enc_len)
  File "/workspace/pipedream/profiler/translation/seq2seq/models/seq2seq_base.py", line 34, in encode
    return self.encoder(inputs, lengths)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 507, in __call__
    result = self.forward(*input, **kwargs)
  File "/workspace/pipedream/profiler/translation/seq2seq/models/encoder.py", line 127, in forward
    x = self.rnn_layers[0](x, lengths)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 507, in __call__
    result = self.forward(*input, **kwargs)
  File "/workspace/pipedream/profiler/translation/seq2seq/models/encoder.py", line 64, in forward
    return self.emu_bidir_lstm(self.layer1, self.layer2, input, lengths)
  File "/workspace/pipedream/profiler/translation/seq2seq/models/encoder.py", line 53, in emu_bidir_lstm
    out1 = model1(inputl1)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 507, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 556, in forward
    return self.forward_tensor(input, hx)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 536, in forward_tensor
    output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 509, in forward_impl
    dtype=input.dtype, device=input.device)
RuntimeError: cuda runtime error (209) : no kernel image is available for execution on the device at /opt/pytorch/pytorch/aten/src/THC/generic/THCTensorMath.cu:35

Do you have any ideas what might be causing this? It isn't clear to me whether or not this is a software/environment issue, or an issue with my specific hardware.

deepakn94 commented 5 years ago

Are you using PyTorch with CUDA enabled?

gth828r commented 5 years ago

As far as I know, yes. I am running this within the docker container. I don't think I've done anything to change pytorch from whatever was already in the container. When I ran the image_classification experiments, they were definitely running on the GPU, which I assume would confirm that the PyTorch with CUDA configuration is at the very least installed and available.

deepakn94 commented 5 years ago

Interesting -- I will dig a bit more tomorrow to try to reproduce this.

deepakn94 commented 5 years ago

One more thing to try: can you try installing seq2seq in the runtime directory, and try running an already partitioned GNMT model?

gth828r commented 5 years ago

Good suggestion! I just tried it, but I unfortunately got the same error:

# python main_with_runtime.py --data_dir /mnt/wmt16 --distributed_backend gloo -m models.gnmt.gpus=4 --epochs 1 -b 128 -v 1 --master_addr 127.0.0.1 --config_path models/gnml/gpus=4/hybrid_conf.json --rank 0 --local_rank 0
THCudaCheck FAIL file=/opt/pytorch/pytorch/aten/src/THC/generic/THCTensorMath.cu line=35 error=209 : no kernel image is available for execution on the device
Traceback (most recent call last):
  File "main_with_runtime.py", line 580, in <module>
    main()
  File "main_with_runtime.py", line 177, in main
    output_tensors = stage(*tuple(input_tensors))
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 507, in __call__
    result = self.forward(*input, **kwargs)
  File "/workspace/pipedream/runtime/translation/models/gnmt/gpus=4/stage0.py", line 20, in forward
    out5 = self.layer5(out4, out1)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 507, in __call__
    result = self.forward(*input, **kwargs)
  File "/workspace/pipedream/runtime/translation/seq2seq/models/encoder.py", line 64, in forward
    return self.emu_bidir_lstm(self.layer1, self.layer2, input, lengths)
  File "/workspace/pipedream/runtime/translation/seq2seq/models/encoder.py", line 53, in emu_bidir_lstm
    out1 = model1(inputl1)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 507, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 556, in forward
    return self.forward_tensor(input, hx)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 536, in forward_tensor
    output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 509, in forward_impl
    dtype=input.dtype, device=input.device)
RuntimeError: cuda runtime error (209) : no kernel image is available for execution on the device at /opt/pytorch/pytorch/aten/src/THC/generic/THCTensorMath.cu:35

I'll be trying some things out and digging this afternoon. I'll report back if I find anything.

gth828r commented 5 years ago

My best guess is that the problem is hardware specific. We are running on a 4 GPU machine that only has GeForce GTX 1080 Ti GPUs in it, and after doing some research I am pretty sure that the GPU feature list targeted in the setup.py script (seems to require Volta support) is beyond our GPUs. Presumably we just cannot run the gnmt experiment on the hardware we have. If this doesn't sound right to you, feel free to keep poking at it. Otherwise, I think we can close this one.

deepakn94 commented 5 years ago

What's the highest compute capability that your GPU supports?

gth828r commented 5 years ago

I believe it supports up to sm_61, and the main missing piece as compared to sm_70 as I understand it is tensor core support.

deepakn94 commented 5 years ago

Hmm, it seems like this won't work then unfortunately.

gth828r commented 5 years ago

Sorry for the delay, and thanks for confirming. I'll close this out.