Error training model - Githubissues

kolaSamuel commented 5 years ago

I'm attempting to train the model using this command after preprocessing, as I have been successfully doing on the default OpenNMT-py. I am using the same command: !python -u ./train.py \ -data preprocessed \ -save_model CQR_valid_100\ -train_steps 10000 \ -rnn_size 128 \ -word_vec_size 128 \ -encoder_type rnn \ -decoder_type rnn \ -optim adagrad \ -learning_rate 0.15 \ -share_embeddings \ -valid_steps 100 \ -save_checkpoint_steps 1000\ -log_file log.txt but i keep getting this error: Traceback (most recent call last): File "./train.py", line 109, in main(opt) File "./train.py", line 41, in main single_main(opt, -1) File "/content/OpenNMT-py-with-BERT/onmt/train_single.py", line 116, in main valid_steps=opt.valid_steps) File "/content/OpenNMT-py-with-BERT/onmt/trainer.py", line 192, in train self._accum_batches(train_iter)): File "/content/OpenNMT-py-with-BERT/onmt/trainer.py", line 127, in _accum_batches for batch in iterator: File "/content/OpenNMT-py-with-BERT/onmt/inputters/inputter.py", line 598, in iter for batch in self._iter_dataset(path): File "/content/OpenNMT-py-with-BERT/onmt/inputters/inputter.py", line 583, in _iter_dataset for batch in cur_iter: File "/usr/local/lib/python3.6/dist-packages/torchtext/data/iterator.py", line 156, in iter yield Batch(minibatch, self.dataset, self.device) File "/usr/local/lib/python3.6/dist-packages/torchtext/data/batch.py", line 34, in init setattr(self, name, field.process(batch, device=device)) File "/content/OpenNMT-py-with-BERT/onmt/inputters/text_dataset.py", line 297, in process bert_embeddings = self.bertify(tensor) File "/content/OpenNMT-py-with-BERT/onmt/inputters/text_dataset.py", line 235, in bertify bert_embeddings = bert_model(tensor, output_all_encoded_layers = False) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling.py", line 730, in forward embedding_output = self.embeddings(input_ids, token_type_ids) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling.py", line 267, in forward words_embeddings = self.word_embeddings(input_ids) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/sparse.py", line 117, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1506, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'

I can fix the error by forcing the device in the text_dataset.py file to always be 'CPU' but this would make training very slow, and it already takes hours train even on the 'GPU'. please help

Ovis85 commented 4 years ago

Hi,

I'm having the same issue as above, is there any documentation or tutorials on how to run this implementation of openNMT with BERT?

thank you Brian

kolaSamuel commented 4 years ago

Hello Brian,

Sadly there are none I found, please let me know if you find anything helpful.

Best Wishes, Samuel

From: Ovis85 Sent: Monday, October 14, 2019 4:12 AM To: shakeel608/OpenNMT-py-with-BERT Cc: SamuelKola; Author Subject: Re: [shakeel608/OpenNMT-py-with-BERT] Error training model (#1)

Hi, I'm having the same issue as above, is there any documentation or tutorials on how to run this implementation of openNMT with BERT? thank you Brian — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Ovis85 commented 4 years ago

HI KolaSamuel,

I was able to fix the issue of "RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'" on google collab, by selecting runtime as type as GPU. I hope that helps.

I run into a new issue now

" File "/content/gdrive/My Drive/OpenNMT-py-with-BERT/onmt/modules/utilclass.py", line 25, in forward assert len(self) == len(inputs) AssertionError"

you mentioned that you got the code to work but only in CPU mode, did you encounter the error that im seeing? and how did you fix this?

thanks

PunitShah1988 commented 4 years ago

Hi @Ovis85,

Have you been able to solve this issue. Coz i am facing the issue now.

Thanks.

shakeel608 / OpenNMT-py-with-BERT

Error training model #1