Closed HasnainKhanNiazi closed 3 years ago
Are you working on linux? Have you tried running the code using the provided conda environment?
Yes, I am working in Linux and I am using the provided conda environment. Here are system specs: GPU: Tesla T4 CUDA Version: 11.2 Ubuntu: 18.04
Weird. I have Ubuntu 18.04.5
and CUDA 11.1
so the environment seems good. Can you send over the command you tried running?
I am using Jupyter Notebook present in the notebooks folder ("inference_playground") and I am getting that error on this import line
from models.psp import pSp
I am not sure what was wrong but now I am not having this error instead I am having an error on this line and error is mentioned below:
Code Line: os.path.join(module_path, 'fused_bias_act_kernel.cu')
Error: ninja: build stopped: subcommand failed.
Hmmm. I just ran the notebook in Colab and it worked fine. Ninja can be a pain and there are no really good references to how to fix them.
Any chance you can send me the full stack trace? Maybe there is something that can help us there.
@yuval-alaluf here is the the full stack trace.
CalledProcessError Traceback (most recent call last) ~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _build_extension_module(name, build_directory, verbose) 1029 cwd=build_directory, -> 1030 check=True) 1031 else:
~/anaconda3/envs/newEnv/lib/python3.6/subprocess.py in run(input, timeout, check, *popenargs, **kwargs) 417 raise CalledProcessError(retcode, process.args, --> 418 output=stdout, stderr=stderr) 419 return CompletedProcess(process.args, retcode, stdout, stderr)
CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
Seems like we're getting somewhere. I noticed the following line:
nvcc fatal : Unsupported gpu architecture 'compute_75'
It seems like there is a mismatch between the GPU and the CUDA version on your system. Were you able to previously use the GPU with CUDA?
I found some other issues that may be of help: https://github.com/facebookresearch/detectron2/issues/149#issuecomment-545793165 https://github.com/torch/torch7/issues/1190#issuecomment-498934400
This is a fresh system and this is first github repo I ran on this machine so can't say for sure about that.
I found some other issues that may be of help: facebookresearch/detectron2#149 (comment) torch/torch7#1190 (comment)
Thanks @yuval-alaluf , let me have a look at these links and I will update you.
Thanks @yuval-alaluf , let me have a look at these links and I will update you.
Cool. In order to isolate the issues with ninja and your machine, I would try to make sure you're able to get torch
to run with a GPU and then try running the code in this repo (since this repo requires ninja which can be tricky on its own).
Thanks @yuval-alaluf , let me have a look at these links and I will update you.
Cool. In order to isolate the issues with ninja and your machine, I would try to make sure you're able to get
torch
to run with a GPU and then try running the code in this repo (since this repo requires ninja which can be tricky on its own).
I tested torch with cuda and it is working fine.
import torch Code: torch.cuda.is_available() OutPut: True Code: torch.cuda.device(0) OutPut: <torch.cuda.device object at 0x7fa552331588> Code: torch.cuda.current_device() OutPut: 0 Code: torch.cuda.device_count() OutPut: 1 Code: torch.cuda.get_device_name(0) OutPut: 'Tesla T4'
Can you please check what version of nvcc
you have? You can do this by running nvcc --version
.
Can you please check what version of
nvcc
you have? You can do this by runningnvcc --version
.
Here is output that I get by running nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation tools, release 9.1, V9.1.85
Yea. I see your problem. It appears that you have multiple CUDA versions instead. If you notice, the result of running nvcc --version
indicates that you are using CUDA 9.1. And CUDA 9.1 is not compatible with your T4 GPU (which requires CUDA >- 10.1). You need to switch your CUDA to use version 11.2 which you mentioned above.
facebookresearch/detectron2#149 (comment) torch/torch7#1190 (comment)
Take a look at the first link here, which will take you to the steps you need for correctly setting your environment to use CUDA 11.1. Just note that in the example there, they use 10.1
so make sure to make the necessary adjustments based on your machine.
Thanks @yuval-alaluf , I have tried these steps to set the Cuda 11.2 in the source file but after setting it up, still it isn't working and giving me the same error.
@yuval-alaluf I have changed Cuda to 11.2 and luckily I am not getting that error but now I am getting an error on this line,
Code: ckpt = torch.load(model_path, map_location='cpu')
Error: `--------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in nti(s) 188 s = nts(s, "ascii", "strict") --> 189 n = int(s.strip() or "0", 8) 190 except ValueError:
ValueError: invalid literal for int() with base 8: 'ightq\x04ct'
During handling of the above exception, another exception occurred:
InvalidHeaderError Traceback (most recent call last) ~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in next(self) 2296 try: -> 2297 tarinfo = self.tarinfo.fromtarfile(self) 2298 except EOFHeaderError as e:
~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in fromtarfile(cls, tarfile) 1092 buf = tarfile.fileobj.read(BLOCKSIZE) -> 1093 obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors) 1094 obj.offset = tarfile.fileobj.tell() - BLOCKSIZE
~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in frombuf(cls, buf, encoding, errors) 1034 -> 1035 chksum = nti(buf[148:156]) 1036 if chksum not in calc_chksums(buf):
~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in nti(s) 190 except ValueError: --> 191 raise InvalidHeaderError("invalid header") 192 return n
InvalidHeaderError: invalid header
During handling of the above exception, another exception occurred:
ReadError Traceback (most recent call last) ~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args) 594 try: --> 595 return legacy_load(f) 596 except tarfile.TarError:
~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in legacy_load(f) 505 --> 506 with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \ 507 mkdtemp() as tmpdir:
~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in open(cls, name, mode, fileobj, bufsize, kwargs) 1588 raise CompressionError("unknown compression type %r" % comptype) -> 1589 return func(name, filemode, fileobj, kwargs) 1590
~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in taropen(cls, name, mode, fileobj, kwargs) 1618 raise ValueError("mode must be 'r', 'a', 'w' or 'x'") -> 1619 return cls(name, mode, fileobj, kwargs) 1620
~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in init(self, name, mode, fileobj, format, tarinfo, dereference, ignore_zeros, encoding, errors, pax_headers, debug, errorlevel, copybufsize) 1481 self.firstmember = None -> 1482 self.firstmember = self.next() 1483
~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in next(self) 2308 elif self.offset == 0: -> 2309 raise ReadError(str(e)) 2310 except EmptyHeaderError:
ReadError: invalid header
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
I think, this is because of Pytorch version.
I think, this is because of Pytorch version.
What torch version are you using?
I am using this torch version 1.3.1+cu100'
Ah. You need to update your torch version to at least 1.6.0
.
Yes, I am doing that, I will update you as soon as I get it done. Thanks for your time, much appreciated.
@yuval-alaluf Thanks for your time, first it was problem-related to Cuda and then the Pytorch version played an important role in giving errors. Now after Cuda setting to 11.3 and Pytorch to 1.9 it is working fine.
Cheers
Hi, I am trying to setup this repo on my own local machine but I am getting this error. I searched on internet but couldn't find a single solution of this. Any help will be appreciated. Thanks
ImportError: No module named 'fused'