error: when using torch(1.8.0+cu111)

hongyinjie commented 2 years ago

Traceback (most recent call last):

  File "translate_test.py", line 66, in <module>

    translate_test()

  File "translate_test.py", line 30, in translate_test

    rest = mt.predict(texts, _from = 'en',batch_size = size)

  File "/mnt/eclipse-glority/receipt/deploy/branches/dev/ms_deploy/util/translate_util.py", line 29, in predict

    rest = self.mt.translate(texts, source=_from, target=_to, batch_size = batch_size)

  File "/home/hyj/anaconda3/envs/tf25/lib/python3.7/site-packages/dl_translate/_translation_model.py", line 197, in translate

    **encoded, **generation_options

  File "/home/hyj/anaconda3/envs/tf25/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context

    return func(*args, **kwargs)

  File "/home/hyj/anaconda3/envs/tf25/lib/python3.7/site-packages/transformers/generation_utils.py", line 927, in generate

    model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(input_ids, model_kwargs)

  File "/home/hyj/anaconda3/envs/tf25/lib/python3.7/site-packages/transformers/generation_utils.py", line 412, in _prepare_encoder_decoder_kwargs_for_generation

    model_kwargs["encoder_outputs"]: ModelOutput = encoder(input_ids, return_dict=True, **encoder_kwargs)

  File "/home/hyj/anaconda3/envs/tf25/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl

    result = self.forward(*input, **kwargs)

  File "/home/hyj/anaconda3/envs/tf25/lib/python3.7/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 780, in forward

    output_attentions=output_attentions,

  File "/home/hyj/anaconda3/envs/tf25/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl

    result = self.forward(*input, **kwargs)

  File "/home/hyj/anaconda3/envs/tf25/lib/python3.7/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 388, in forward

    hidden_states = self.activation_fn(self.fc1(hidden_states))

  File "/home/hyj/anaconda3/envs/tf25/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl

    result = self.forward(*input, **kwargs)

  File "/home/hyj/anaconda3/envs/tf25/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 94, in forward

    return F.linear(input, self.weight, self.bias)

  File "/home/hyj/anaconda3/envs/tf25/lib/python3.7/site-packages/torch/nn/functional.py", line 1753, in linear

    return torch._C._nn.linear(input, weight, bias)

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

torch                            1.8.0+cu111

torchvision                      0.9.0+cu111

it is ok, when

torch 1.7.1+cu101

how to fix ?

xhluca commented 2 years ago

I just tried installing and running dl-translate on Colab (torch 1.10 + cu111) and it works fine: https://colab.research.google.com/drive/14LzNXeGx4eFO0YyjGgRrVj7E3tVSQPUP?usp=sharing

Could you try creating a virtualenv and upgrading torch? Also I suggest testing Huggingface models directly (see M2M100 docs) and if the error happens again then you could try to open an issue in their repo.

hongyinjie commented 2 years ago

done! It is ok... thanks...

xhluca / dl-translate

error: when using torch(1.8.0+cu111) #40