Closed celsofranssa closed 1 year ago
Sorry, I'm not familiar with Colab. I guess a mismatched version of numpy causes this error but I don't know why this happens.
For anyone hitting this in the future, removing the numpy version from the requirements solves the issue.
Also, is there an updated version with a more recent torch? It has been very hard to configure the environment with the old CUDA 10 to support torch 1.0.1.
For instance, I am facing multiples errors like:
[I 230221 09:56:33 main:37] Model Name: AttentionXML
[I 230221 09:56:33 main:40] Loading Training and Validation Set
[I 230221 09:56:33 main:52] Number of Labels: 29801
[I 230221 09:56:33 main:53] Size of Training Set: 14748
[I 230221 09:56:33 main:54] Size of Validation Set: 200
[I 230221 09:56:33 main:56] Training
Traceback (most recent call last):
File "main.py", line 95, in <module>
main()
File "/home/celso/projects/venvs/AttentionXML/lib/python3.8/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/celso/projects/venvs/AttentionXML/lib/python3.8/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/celso/projects/venvs/AttentionXML/lib/python3.8/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/celso/projects/venvs/AttentionXML/lib/python3.8/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "main.py", line 64, in main
model.train(train_loader, valid_loader, **model_cnf['train'])
File "/home/celso/projects/AttentionXML/deepxml/models.py", line 67, in train
loss = self.train_step(train_x, train_y.cuda())
File "/home/celso/projects/AttentionXML/deepxml/models.py", line 42, in train_step
scores = self.model(train_x)
File "/home/celso/projects/venvs/AttentionXML/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/celso/projects/venvs/AttentionXML/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/celso/projects/venvs/AttentionXML/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/celso/projects/AttentionXML/deepxml/networks.py", line 42, in forward
rnn_out = self.lstm(emb_out, lengths) # N, L, hidden_size * 2
File "/home/celso/projects/venvs/AttentionXML/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/celso/projects/AttentionXML/deepxml/modules.py", line 60, in forward
self.lstm(packed_inputs, (hidden_init, cell_init))[0], batch_first=True)
File "/home/celso/projects/venvs/AttentionXML/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/celso/projects/venvs/AttentionXML/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 561, in forward
result = _VF.lstm(input, batch_sizes, hx, self._flat_weights, self.bias,
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Also, is there an updated version with a more recent torch? It has been very hard to configure the environment with the old CUDA 10 to support torch 1.0.1.
Sorry. I will try to update the codes with a more recent torch when I'm free.
No need to apologize ;-) it's an excellent project and is worth the effort to reproduce the published results.
For me, everything became simpler to execute when I started using the pytorch/pytorch 1.0.1-cuda10.0-cudnn7-devel
docker image. Maybe it's just a case of including a dockerized version.
Hello,
I am trying to run preprocess on provided Wiki10-31K dataset. However, I am facing the following error:
You can reproduce the error in this Colab Notebook.