Internal Error when training

ShivamSharma1997 commented 4 years ago

I am trying to train the GeDi on my own model and got the following error on both my dataset as well as your dataset.

Traceback (most recent call last): File "../train_GeDi.py", line 1103, in main() File "../train_GeDi.py", line 1052, in main global_step, tr_loss = train(args, train_dataset, model, tokenizer) File "../train_GeDi.py", line 356, in train loss_b*=loss_mask RuntimeError: diff_view_meta->outputnr == 0 INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1603729062494/work/torch/csrc/autograd/variable.cpp":363, please report a bug to PyTorch.

I narrowed it down to the problem that I am trying to train the model using multiple GPUs but after setting default GPU as 0, I am still getting the same error. Please help!

akhileshgotmare commented 3 years ago

Noting for reference that using pytorch 1.6 resolves this - https://github.com/salesforce/GeDi/issues/6#issuecomment-738605759 are you using the pytorch docker image mentioned in the readme?

yugaljain1999 commented 3 years ago

@akhileshgotmare using pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html resolved this, so you should update readme. `

salesforce / GeDi

Internal Error when training #10