Closed narendoraiswamy closed 5 years ago
Even though I cannot reproduce your problem, I can see that not having a proper requirements/environment file with version numbers can be problematic.
I will make some test with the latest versions of PyTorch (and the other requirements) and then create a proper conda environment that should work out of the box.
That would be absolutely helpful. Looking forward to the updated requirements.txt/environment.yml file. However can you provide an estimate as to when it would be available if you don't mind?
Thank you.
I have pushed the changes. Can you please check if creating a conda environment with the requirements from the environment.yml file resolves your issues?
Yes, thank you. The new environment.yml file solves the problem. However GVE model training is occurring on the CPU. You probably have to push it to the device in the gve_trainer.py
file.
Are you sure that you have nvidia drivers installed? You can check if torch is able to use cuda by running: torch.cuda.is_available()
which should return True
.
The GVETrainer
class inherits from LRCNTrainer
where the model is pushed to the cuda device if possible.
Yes. The drivers are installed and interestingly the torch.cuda.is_available()
gives False
. I am assuming it is due to driver version differences or due to CUDA 9 and 10 differences. I use CUDA 9 and the environment uses CUDA 10. However I believe there should be backward compatibility between the two.
The issue is due to the incompatibility between the nvidia drivers and the CUDA Toolkit. I am using a pretty old version, 384.130
and we need >=410.48
for CUDA 10. Hence the problem.
But I will close the issue here. Thank you for your response:)
Ok, have you tried changing the cudatoolkit requirement in environment.yml
from
cudatoolkit=10.0.130
to
cudatoolkit=9.0
and then building the environment? I think that should still work.
Yes. I did the same and it works without any other dependencies breaking:+1: Thank you.
Hello,
Thank you for the pytorch version of the code. With the provided requirements.txt file(which doesn't have the versions used for every dependent package), I get the conventional error,
"Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-1_ps2cdw/matplotlib/"
. This usually occurs due to the setuptools package which might have not been up to date. Unfortunately, all the available solutions to this problem have been tried and haven't been of any help to me.Hence, is it possible for you to provide an updated requirements.txt file or the environment.yml file containing all the dependencies?
Thank you.