salesforce / decaNLP

The Natural Language Decathlon: A Multitask Challenge for NLP
BSD 3-Clause "New" or "Revised" License
2.34k stars 474 forks source link

Error during training #54

Open 5y opened 4 years ago

5y commented 4 years ago

By any chance do you have any idea why I received following error during training? I'm running the docker file on some RTX 2080 ti and the last version of CUDA.

Thank you.

process_0 - Initializing MultitaskQuestionAnsweringNetwork process_0 - MultitaskQuestionAnsweringNetwork has 14,469,902 trainable parameters Traceback (most recent call last): File "/decaNLP/train.py", line 374, in main() File "/decaNLP/train.py", line 370, in main run(args, run_args, world_size=args.world_size) File "/decaNLP/train.py", line 299, in run model = init_model(args, field, logger, world_size, device) File "/decaNLP/train.py", line 327, in init_model model.to(device) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 379, in to return self._apply(convert) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 185, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 185, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 112, in _apply self.flatten_parameters() File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 105, in flatten_parameters self.batch_first, bool(self.bidirectional)) RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS