mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.6k stars 553 forks source link

[DeepSpeech2] RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation #46

Closed anandj91 closed 1 year ago

anandj91 commented 6 years ago

DS2 code is throwing following error when run_and_time.sh is executed.

Traceback (most recent call last): File "train.py", line 289, in <module> main() File "train.py", line 214, in main loss.backward() File "/h/anandj/reference/speech_recognition/env/local/lib/python2.7/site-packages/torch/tensor.py", line 93, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/h/anandj/reference/speech_recognition/env/local/lib/python2.7/site-packages/torch/autograd/__init__.py", line 89, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

The only thing I changed from existing code is the pytorch version. I found some problems while installing pytorch 0.2.0. So I switched it to 0.4.0.

bitfort commented 6 years ago

Can you try running it in the docker? That could help with this.

jmuffat commented 6 years ago

Same problem here (running in the docker)

mankeyboy commented 6 years ago

Facing the same issue here. Was anyone able to solve this?

mankeyboy commented 6 years ago

@bitfort , can you explain the issue that is occurring?

bitfort commented 6 years ago

@codyaustun Do you have thoughts on this?

codyaustun commented 6 years ago

@bitfort if the PyTorch version changed from v0.2.0 to v0.4.0, I am not surprised that something broke because version changes aren't backward compatible in PyTorch. At a quick glance, I don't know what needs to be updated to support v0.4.0. Who is the owner for this benchmark?

bitfort commented 6 years ago

Baidu I think are the people to talk to about it.

peladodigital commented 1 year ago

In an effort to clean up the git repo so we can maintain it better going forward, the MLPerf Training working group is closing out issues older than 2 years, since much has changed in the benchmark suite. If you think this issue is still relevant, please feel free to reopen. Even better, please come to the working group meeting to discuss your issue.