Closed anandj91 closed 1 year ago
Can you try running it in the docker? That could help with this.
Same problem here (running in the docker)
Facing the same issue here. Was anyone able to solve this?
@bitfort , can you explain the issue that is occurring?
@codyaustun Do you have thoughts on this?
@bitfort if the PyTorch version changed from v0.2.0 to v0.4.0, I am not surprised that something broke because version changes aren't backward compatible in PyTorch. At a quick glance, I don't know what needs to be updated to support v0.4.0. Who is the owner for this benchmark?
Baidu I think are the people to talk to about it.
In an effort to clean up the git repo so we can maintain it better going forward, the MLPerf Training working group is closing out issues older than 2 years, since much has changed in the benchmark suite. If you think this issue is still relevant, please feel free to reopen. Even better, please come to the working group meeting to discuss your issue.
DS2 code is throwing following error when run_and_time.sh is executed.
Traceback (most recent call last): File "train.py", line 289, in <module> main() File "train.py", line 214, in main loss.backward() File "/h/anandj/reference/speech_recognition/env/local/lib/python2.7/site-packages/torch/tensor.py", line 93, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/h/anandj/reference/speech_recognition/env/local/lib/python2.7/site-packages/torch/autograd/__init__.py", line 89, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
The only thing I changed from existing code is the pytorch version. I found some problems while installing pytorch 0.2.0. So I switched it to 0.4.0.