Closed haoshuai714 closed 3 years ago
Hello,
I cannot reproduce your problem and you did not fill out the issue template so I have no idea about your setup (OS, python version, etc.).
I suggest: Install miniconda, setup conda environment with python=3.8, install pytorch, install requirements, then try again.
Going to close this for now since it's probably nothing to do on our end.
Hi, I also have this problem and i have a miniconda and conda env with python 3.8
Hi,
Please run python -V
to make sure your env is active.
Fill out all fields in the following issue template and I will take a look at your problem:
Describe the bug A clear and concise description of what the bug is. (Include full exception stack)
To Reproduce Steps to reproduce the behavior.
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
System Info:
OS: [e.g. Ubuntu 18.04]
Python version [e.g. 3.8.6]
PyTorch version [e.g. 1.7.0+cu11]
Additional context Add any other context about the problem here.
Describe the bug
the line python3 train_retrieval.py -c config/retrieval/paper2020/anet_coot.yaml gives the following problem:
Traceback (most recent call last):
File "train_retrieval.py", line 6, in
To Reproduce
I connect to the GPU. I use a miniconda3 environment with python 3.8.5, I have pytorch and all the libraries installed as well as the requirements. I go the the coot-videotext directory. I then use a SLURM job script to run the following command:
python3 train_retrieval.py -c config/retrieval/paper2020/anet_coot.yaml
Expected behavior
The program starts the training of the model
Screenshots This is the exception stack
This is my python version
System Info: OS: CentOS Linux Python Version: 3.8.5 Pytorch version: 1.7.1
Hi, looks like you are accidentally using some old python version inside your job that doesn't understand f-strings.
Your slurm job runs "python3..." while your other command is "python -V"... without the 3.
Probably you have to setup your miniconda environment inside the slurm job.
To test this, add "python -V" and "python3 -V" to your slurm job script, run it and check the logs, you should see the wrong version.
Adding something like "conda activate base" to your jobscript may solve the problem.
Hi, thank you it was indeed the environment problem in the job.sh.
However, I am encountering this issue when trying to run the following command on CPU: python train_retrieval.py -c config/retrieval/paper2020/anet_coot.yaml --load_model provided_models/anet_coot.pth --validate
Traceback (most recent call last):
File "train_retrieval.py", line 95, in
Hi,
CPU-only is untested. Try the following:
Add --no_cuda
If that doesn't work, go to nntrainer/trainer_base.py
function hook_post_init and change the line model_state = th.load(str(self.load_model))
to model_state = th.load(str(self.load_model), map_location=torch.device('cpu'))
as it says in the error message.
Run:python3 train_retrieval.py -c config/retrieval/paper2020/anet_coot.yaml Error: Traceback (most recent call last): File "train_retrieval.py", line 6, in from coot.configs_retrieval import ExperimentTypesConst, RetrievalConfig as Config File "/data2/haoxiaoshuai/new_coot/coot/configs_retrieval.py", line 11, in from nntrainer import data as nn_data, lr_scheduler, models, optimization, trainer_configs, typext, utils File "/data2/haoxiaoshuai/new_coot/nntrainer/models/init.py", line 4, in from nntrainer.initialization import init_network, initweight File "/data2/haoxiaoshuai/new_coot/nntrainer/initialization.py", line 7, in from nntrainer import utils_torch, typext, utils File "", line 1 (cudnn.benchmark=) ^ SyntaxError: invalid syntax