yuval-alaluf / SAM

Official Implementation for "Only a Matter of Style: Age Transformation Using a Style-Based Regression Model" (SIGGRAPH 2021) https://arxiv.org/abs/2102.02754
https://yuval-alaluf.github.io/SAM/
MIT License
632 stars 151 forks source link

ImportError: No module named 'fused' #35

Closed HasnainKhanNiazi closed 3 years ago

HasnainKhanNiazi commented 3 years ago

Hi, I am trying to setup this repo on my own local machine but I am getting this error. I searched on internet but couldn't find a single solution of this. Any help will be appreciated. Thanks

ImportError: No module named 'fused'

yuval-alaluf commented 3 years ago

Are you working on linux? Have you tried running the code using the provided conda environment?

HasnainKhanNiazi commented 3 years ago

Yes, I am working in Linux and I am using the provided conda environment. Here are system specs: GPU: Tesla T4 CUDA Version: 11.2 Ubuntu: 18.04

yuval-alaluf commented 3 years ago

Weird. I have Ubuntu 18.04.5 and CUDA 11.1 so the environment seems good. Can you send over the command you tried running?

HasnainKhanNiazi commented 3 years ago

I am using Jupyter Notebook present in the notebooks folder ("inference_playground") and I am getting that error on this import line

from models.psp import pSp

HasnainKhanNiazi commented 3 years ago

I am not sure what was wrong but now I am not having this error instead I am having an error on this line and error is mentioned below:

Code Line: os.path.join(module_path, 'fused_bias_act_kernel.cu')

Error: ninja: build stopped: subcommand failed.

yuval-alaluf commented 3 years ago

Hmmm. I just ran the notebook in Colab and it worked fine. Ninja can be a pain and there are no really good references to how to fix them.

Any chance you can send me the full stack trace? Maybe there is something that can help us there.

HasnainKhanNiazi commented 3 years ago

@yuval-alaluf here is the the full stack trace.


CalledProcessError Traceback (most recent call last) ~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _build_extension_module(name, build_directory, verbose) 1029 cwd=build_directory, -> 1030 check=True) 1031 else:

~/anaconda3/envs/newEnv/lib/python3.6/subprocess.py in run(input, timeout, check, *popenargs, **kwargs) 417 raise CalledProcessError(retcode, process.args, --> 418 output=stdout, stderr=stderr) 419 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)

in 13 from datasets.augmentations import AgeTransformer 14 from utils.common import tensor2im ---> 15 from models.psp import pSp /SAM/notebooks/SAM/notebooks/SAM/models/psp.py in 10 11 from configs.paths_config import model_paths ---> 12 from models.encoders import psp_encoders 13 from models.stylegan2.model import Generator 14 /SAM/notebooks/SAM/notebooks/SAM/models/encoders/psp_encoders.py in 6 7 from models.encoders.helpers import get_blocks, bottleneck_IR, bottleneck_IR_SE ----> 8 from models.stylegan2.model import EqualLinear 9 10 /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/model.py in 5 from torch.nn import functional as F 6 ----> 7 from models.stylegan2.op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d 8 9 /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/__init__.py in ----> 1 from .fused_act import FusedLeakyReLU, fused_leaky_relu 2 from .upfirdn2d import upfirdn2d /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_act.py in 11 sources=[ 12 os.path.join(module_path, 'fused_bias_act.cpp'), ---> 13 os.path.join(module_path, 'fused_bias_act_kernel.cu'), 14 ], 15 ) ~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module) 659 verbose, 660 with_cuda, --> 661 is_python_module) 662 663 ~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module) 828 build_directory=build_directory, 829 verbose=verbose, --> 830 with_cuda=with_cuda) 831 finally: 832 baton.release() ~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _write_ninja_file_and_build(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda) 881 if verbose: 882 print('Building extension module {}...'.format(name)) --> 883 _build_extension_module(name, build_directory, verbose) 884 885 ~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _build_extension_module(name, build_directory, verbose) 1041 if hasattr(error, 'output') and error.output: 1042 message += ": {}".format(error.output.decode()) -> 1043 raise RuntimeError(message) 1044 1045 RuntimeError: Error building extension 'fused': [1/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/TH -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/THC -isystem /root/anaconda3/envs/newEnv/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o FAILED: fused_bias_act_kernel.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/TH -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/THC -isystem /root/anaconda3/envs/newEnv/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o nvcc fatal : Unsupported gpu architecture 'compute_75' [2/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/TH -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/THC -isystem /root/anaconda3/envs/newEnv/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_bias_act.cpp -o fused_bias_act.o ninja: build stopped: subcommand failed.
yuval-alaluf commented 3 years ago

Seems like we're getting somewhere. I noticed the following line: nvcc fatal : Unsupported gpu architecture 'compute_75' It seems like there is a mismatch between the GPU and the CUDA version on your system. Were you able to previously use the GPU with CUDA?

yuval-alaluf commented 3 years ago

I found some other issues that may be of help: https://github.com/facebookresearch/detectron2/issues/149#issuecomment-545793165 https://github.com/torch/torch7/issues/1190#issuecomment-498934400

HasnainKhanNiazi commented 3 years ago

This is a fresh system and this is first github repo I ran on this machine so can't say for sure about that.

HasnainKhanNiazi commented 3 years ago

I found some other issues that may be of help: facebookresearch/detectron2#149 (comment) torch/torch7#1190 (comment)

Thanks @yuval-alaluf , let me have a look at these links and I will update you.

yuval-alaluf commented 3 years ago

Thanks @yuval-alaluf , let me have a look at these links and I will update you.

Cool. In order to isolate the issues with ninja and your machine, I would try to make sure you're able to get torch to run with a GPU and then try running the code in this repo (since this repo requires ninja which can be tricky on its own).

HasnainKhanNiazi commented 3 years ago

Thanks @yuval-alaluf , let me have a look at these links and I will update you.

Cool. In order to isolate the issues with ninja and your machine, I would try to make sure you're able to get torch to run with a GPU and then try running the code in this repo (since this repo requires ninja which can be tricky on its own).

I tested torch with cuda and it is working fine.

import torch Code: torch.cuda.is_available() OutPut: True Code: torch.cuda.device(0) OutPut: <torch.cuda.device object at 0x7fa552331588> Code: torch.cuda.current_device() OutPut: 0 Code: torch.cuda.device_count() OutPut: 1 Code: torch.cuda.get_device_name(0) OutPut: 'Tesla T4'

yuval-alaluf commented 3 years ago

Can you please check what version of nvcc you have? You can do this by running nvcc --version.

HasnainKhanNiazi commented 3 years ago

Can you please check what version of nvcc you have? You can do this by running nvcc --version.

Here is output that I get by running nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation tools, release 9.1, V9.1.85

yuval-alaluf commented 3 years ago

Yea. I see your problem. It appears that you have multiple CUDA versions instead. If you notice, the result of running nvcc --version indicates that you are using CUDA 9.1. And CUDA 9.1 is not compatible with your T4 GPU (which requires CUDA >- 10.1). You need to switch your CUDA to use version 11.2 which you mentioned above.

facebookresearch/detectron2#149 (comment) torch/torch7#1190 (comment)

Take a look at the first link here, which will take you to the steps you need for correctly setting your environment to use CUDA 11.1. Just note that in the example there, they use 10.1 so make sure to make the necessary adjustments based on your machine.

HasnainKhanNiazi commented 3 years ago

Thanks @yuval-alaluf , I have tried these steps to set the Cuda 11.2 in the source file but after setting it up, still it isn't working and giving me the same error.

HasnainKhanNiazi commented 3 years ago

@yuval-alaluf I have changed Cuda to 11.2 and luckily I am not getting that error but now I am getting an error on this line,

Code: ckpt = torch.load(model_path, map_location='cpu')

Error: `--------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in nti(s) 188 s = nts(s, "ascii", "strict") --> 189 n = int(s.strip() or "0", 8) 190 except ValueError:

ValueError: invalid literal for int() with base 8: 'ightq\x04ct'

During handling of the above exception, another exception occurred:

InvalidHeaderError Traceback (most recent call last) ~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in next(self) 2296 try: -> 2297 tarinfo = self.tarinfo.fromtarfile(self) 2298 except EOFHeaderError as e:

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in fromtarfile(cls, tarfile) 1092 buf = tarfile.fileobj.read(BLOCKSIZE) -> 1093 obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors) 1094 obj.offset = tarfile.fileobj.tell() - BLOCKSIZE

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in frombuf(cls, buf, encoding, errors) 1034 -> 1035 chksum = nti(buf[148:156]) 1036 if chksum not in calc_chksums(buf):

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in nti(s) 190 except ValueError: --> 191 raise InvalidHeaderError("invalid header") 192 return n

InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

ReadError Traceback (most recent call last) ~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args) 594 try: --> 595 return legacy_load(f) 596 except tarfile.TarError:

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in legacy_load(f) 505 --> 506 with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \ 507 mkdtemp() as tmpdir:

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in open(cls, name, mode, fileobj, bufsize, kwargs) 1588 raise CompressionError("unknown compression type %r" % comptype) -> 1589 return func(name, filemode, fileobj, kwargs) 1590

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in taropen(cls, name, mode, fileobj, kwargs) 1618 raise ValueError("mode must be 'r', 'a', 'w' or 'x'") -> 1619 return cls(name, mode, fileobj, kwargs) 1620

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in init(self, name, mode, fileobj, format, tarinfo, dereference, ignore_zeros, encoding, errors, pax_headers, debug, errorlevel, copybufsize) 1481 self.firstmember = None -> 1482 self.firstmember = self.next() 1483

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in next(self) 2308 elif self.offset == 0: -> 2309 raise ReadError(str(e)) 2310 except EmptyHeaderError:

ReadError: invalid header

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)

in 1 model_path = EXPERIMENT_ARGS['model_path'] ----> 2 ckpt = torch.load(model_path, map_location='cpu') ~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args) 424 if sys.version_info >= (3, 0) and 'encoding' not in pickle_load_args.keys(): 425 pickle_load_args['encoding'] = 'utf-8' --> 426 return _load(f, map_location, pickle_module, **pickle_load_args) 427 finally: 428 if new_fd: ~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args) 597 if _is_zipfile(f): 598 # .zip is used for torch.jit.save and will throw an un-pickling error here --> 599 raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name)) 600 # if not a tarfile, reset file offset and proceed 601 f.seek(0) RuntimeError: ../pretrained_models/sam_ffhq_aging.pt is a zip archive (did you mean to use torch.jit.load()?)`
HasnainKhanNiazi commented 3 years ago

I think, this is because of Pytorch version.

yuval-alaluf commented 3 years ago

I think, this is because of Pytorch version.

What torch version are you using?

HasnainKhanNiazi commented 3 years ago

I am using this torch version 1.3.1+cu100'

yuval-alaluf commented 3 years ago

Ah. You need to update your torch version to at least 1.6.0.

HasnainKhanNiazi commented 3 years ago

Yes, I am doing that, I will update you as soon as I get it done. Thanks for your time, much appreciated.

HasnainKhanNiazi commented 3 years ago

@yuval-alaluf Thanks for your time, first it was problem-related to Cuda and then the Pytorch version played an important role in giving errors. Now after Cuda setting to 11.3 and Pytorch to 1.9 it is working fine.

Cheers