RuntimeError when trying to train a new model

nitedl commented 5 years ago

First of all, thank you for sharing the code for this great work.

I'm trying to train a model in the most simple setup using the following command-line arguments: (python self-attentive-parser-master/src/main.py) train --model-path-base . --train-path self-attentive-parser-master\data\02-21.10way.clean --use-words

using python 3.6 on windows 10 with the latest pytorch, cython etc. I get the following error after "Training...":

Traceback (most recent call last): File "self-attentive-parser-master/src/main.py", line 612, in main() File "self-attentive-parser-master/src/main.py", line 608, in main args.callback(args) File "self-attentive-parser-master/src/main.py", line 564, in subparser.set_defaults(callback=lambda args: run_train(args, hparams)) File "self-attentive-parser-master/src/main.py", line 312, in runtrain , loss = parser.parse_batch(subbatch_sentences, subbatch_trees) File "self-attentive-parser-master\src\parse_nk.py", line 1010, in parsebatch annotations, = self.encoder(emb_idxs, batch_idxs, extra_content_annotations=extra_content_annotations) File "venv_parsing_36\lib\site-packages\torch\nn\modules\module.py", line 489, in call result = self.forward(*input, kwargs) File "self-attentive-parser-master\src\parse_nk.py", line 607, in forward res, timing_signal, batch_idxs = emb(xs, batch_idxs, extra_content_annotations=extra_content_annotations) File "venv_parsing_36\lib\site-packages\torch\nn\modules\module.py", line 489, in call result = self.forward(*input, *kwargs) File "self-attentive-parser-master\src\parse_nk.py", line 486, in forward for x, emb, emb_dropout in zip(xs, self.embs, self.emb_dropouts) File "self-attentive-parser-master\src\parse_nk.py", line 486, in for x, emb, emb_dropout in zip(xs, self.embs, self.emb_dropouts) File "venv_parsing_36\lib\site-packages\torch\nn\modules\module.py", line 489, in call result = self.forward(input, kwargs) File "venv_parsing_36\lib\site-packages\torch\nn\modules\sparse.py", line 118, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "venv_parsing_36\lib\site-packages\torch\nn\functional.py", line 1454, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

Process finished with exit code 1

Trying to run with 'use_cuda = False' in parse_nk.py I get the same error (with 'torch.IntTensor' instead of 'torch.cuda.IntTensor'), so it doesn't seem to be cuda-related.

To make sure this is not a compatibility issue, I tried running in another virtual environment with python 3.6, cython 0.25.2 and pytorch 0.4.1 (with which the code was originally tested, according to the documentation), and I get the same error, with 'torch.cpu.IntTensor' replaced by 'CUDAIntTensor'.

I found some references for this error on the web but nothing helpful. Have you encountered this error? Any idea what's causing it?

Thanks

nikitakit commented 5 years ago

It looks like this is a Windows-specific error. I don't run Windows myself, so I can't know for certain what the issue is.

After some searching I found the following fix for the same error message: https://github.com/ictnlp-wshugen/annotated-transformer_codes/commit/ffe3bcc2665fbe5a7f1d53ca8819b1a455903cb8

My guess regarding what's happening is that on Mac/Linux, code such as torch.from_numpy(np.array([1,2])) has dtype long, but on Windows it has dtype int for some reason. I'm guessing the error might go away if you call .long() on the indices passed to the embedding.

nitedl commented 5 years ago

This indeed was the problem. As you suggested, I added .long() to the return value in the definitions of from_numpy (in parse_nk.py, lines 11 & 17) - replaced torch.from_numpy(ndarray) with torch.from_numpy(ndarray).long().

I also had to compile evalb for Windows and change the executable name to "evalb.exe" (in evaluate.py, line 27).

Now the parser trains smoothly on Windows. Thank you!

nikitakit / self-attentive-parser

RuntimeError when trying to train a new model #25