Closed paul-adlink closed 4 years ago
It seems need to roll back the Pytorch version to 1.1.0. then I can obtained the enclib_cpu.so when execute the "python train.py --evaluate --snapshot checkpoints/best_cityscapes_checkpoint.pth" (Even do not need to modify the "ninja -v" to "ninja --version") I offer my packages version for reference:
Python 3.6.6 :: Anaconda, Inc. numpy 1.15.2 (not use 1.17.4 now) PyTorch 1.1.0 (not use 1.3.1 now) torchvision 0.2.1 scipy 1.1.0 scikit-image 0.16.2 tensorboardX 1.9 tqdm 4.26.0 torch-encoding 1.0.1 opencv-python 4.1.1.26 PyYAML 3.13
good luck all.
I'm trying to fix this problem, but seems very few information for me. I'm using dock images from NGC nvcr.io/nvidia/pytorch:18.10-py03. After modify the ['ninja', '-v'] to ['ninja', '--version'], I run
python train.py --evaluate --snapshot checkpoints/best_cityscapes_checkpoint.pth
and met problem below: No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Traceback (most recent call last): File "train.py", line 381, in
main()
File "train.py", line 128, in main
assert_and_infer_cfg(args)
File "/usr/GSCNN/config.py", line 86, in assert_and_infer_cfg
import encoding
File "/opt/conda/lib/python3.6/site-packages/encoding/init.py", line 13, in
from . import nn, functions, parallel, utils, models, datasets, transforms
File "/opt/conda/lib/python3.6/site-packages/encoding/nn/init.py", line 12, in
from .encoding import
File "/opt/conda/lib/python3.6/site-packages/encoding/nn/encoding.py", line 18, in
from ..functions import scaled_l2, aggregate, pairwise_cosine
File "/opt/conda/lib/python3.6/site-packages/encoding/functions/init.py", line 2, in
from .encoding import
File "/opt/conda/lib/python3.6/site-packages/encoding/functions/encoding.py", line 14, in
from .. import lib
File "/opt/conda/lib/python3.6/site-packages/encoding/lib/init.py", line 15, in
], build_directory=cpu_path, verbose=False)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 661, in load
is_python_module)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 841, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1048, in _import_module_from_library
file, path, description = imp.find_module(module_name, [path])
File "/opt/conda/lib/python3.6/imp.py", line 297, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'enclib_cpu'
my pre-requests version: Python 3.6.6 :: Anaconda, Inc. numpy 1.17.4 PyTorch 1.3.1 torchvision 0.2.1 scipy 1.1.0 scikit-image 0.16.2 tensorboardX 1.9 tqdm 4.26.0 torch-encoding 1.0.1 opencv-python 4.1.1.26 PyYAML 3.13
Anyone met this kind of problem before? or could provide some suggestions for me?