undefined symbol: THPVariableClass

jatentaki commented 6 years ago

OS: Ubuntu 16.04
PyTorch version: 0.4.0
How you installed PyTorch (conda, pip, source): conda
Python version: 3.6
CUDA/cuDNN version: 9.0
GPU models and configuration: Titan XP

I built an extension basing on this tutorial and it used to work. I was then doing some refactoring and fixes (in cuda/cpp code) and afterwards it started failing at runtime:

/home/jatentaki/anaconda3/lib/python3.6/site-packages/sort2_cuda-0.0.0-py3.6-linux-x86_64.egg/lltm_cpp.cpython-36m-x86_64-linux-gnu.so: undefined symbol: THPVariableClass

(both for CUDA and cpp versions). Then I tried if the original example still worked, and to my surprise, no longer.

Timeline:

My initial success was on some 0.4.0 pre-release source build for cuda8.0.
I broke it
Trying to troubleshoot, I reinstalled conda and torch for the release 0.4.0 version, with cuda9.0
Neither my code nor your original example work

I believe the error just means I am not linking against some static library, but I don't see when and how I could have introduced that change.

goldsborough commented 6 years ago

This often occurs when you import the extension before import torch. Are you sure the order you are importing is:

import torch
import your_extension

Also, does this error occur when you import torch or import your_extension? Or does it fail when compiling the extension?

jatentaki commented 6 years ago

Ok, I won't be able to test on the same machine before tomorrow, but the fix works on my personal laptop. Perhaps this should be mentioned in the tutorial? Maybe it's common setuptools knowledge, but it caught me off guard.

goldsborough commented 6 years ago

It says it in the tutorial -- there is a line saying

Just be sure to import torch first, as this will resolve some symbols that the dynamic linker must see

It doesn't have anything to do with setuptools, it's just a dynamic linking issue. The torch module is a shared (dynamic) library which defines certain symbols that are unresolved in the extension library. To make these symbols available, the library containing the symbols (torch) must be imported before the library using them (your_extension) so that the dynamic linker can match the symbols with those from the torch library.

ezyang commented 6 years ago

I helped another user who made the same mistake. Maybe we can figure out a good way to give a better error message.

goldsborough commented 6 years ago

@ezyang I'll think of something

Spandan-Madan commented 6 years ago

Having a similar error, and loading torch before the extension doesn't solve it. Here's the error stack:- Version Info:

Pytorch version: 0.4.1
CUDA version: 8.0
GCC version: 5.2.0

Error stack:-

>>> import torch
>>> import modules
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/graphics/toyota-pytorch/inplace_abn/modules/__init__.py", line 2, in <module>
    from .bn import ABN, InPlaceABN, InPlaceABNSync
  File "/data/graphics/toyota-pytorch/inplace_abn/modules/bn.py", line 10, in <module>
    from .functions import *
  File "/data/graphics/toyota-pytorch/inplace_abn/modules/functions.py", line 17, in <module>
    extra_cuda_cflags=["--expt-extended-lambda"])
  File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/test/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 494, in load
    with_cuda=with_cuda)
  File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/test/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 670, in _jit_compile
    return _import_module_from_library(name, build_directory)
  File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/test/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 753, in _import_module_from_library
    return imp.load_module(module_name, file, path, description)
  File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/test/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/test/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: /tmp/torch_extensions/inplace_abn/inplace_abn.so: undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
>>>

Code base which I'm trying to run when the error occurs:- https://github.com/mapillary/inplace_abn

Any leads on what I should try?

soumith commented 6 years ago

@Spandan-Madan this is basically flaking on an ABI incompatibility. (gcc > 5.1 binaries have different std::string ABI than gccc <= 5.1 binaries).

For this, we (pytorch) have a patch in 0.4.1 that sets a flag to compile the cpp-extension with _GLIBCXX_USE_CXX11_ABI=0 (see https://github.com/pytorch/pytorch/commit/f08f222db3b23e925754ee29c882cec0c7da461e ).

Did you build the extension with pytorch-master and switch back to pytorch-0.4.1 (or something of that sort)?

Spandan-Madan commented 6 years ago

Thanks for the reply @soumith.

I am using an extension present in the folder modules here in this repo: https://github.com/mapillary/inplace_abn

I installed Pytorch using conda (both normal and your channel), but I get this error in both.

Any leads on what I should try would be helpful. I've tried running with GCC 4.8 and 5.2 both, error persists.

Thanks in advance :)

ChujunWhu commented 5 years ago

@Spandan-Madan Hi, have you solved the problem yet? Met the same problem and tired gcc 4.8, gcc 4.9 and gcc 5.4 but all failed. The error still exists My pytorch is 0.4.1.

etoilestar commented 4 years ago

This often occurs when you import the extension before import torch. Are you sure the order you are importing is:
import torch
import your_extension
Also, does this error occur when you import torch or import your_extension? Or does it fail when compiling the extension?

hello， i meet the same problem, and i import torch before import _C, but it also occur, could you help me?

heiner commented 4 years ago

I suspect the underlying error is https://github.com/pytorch/pytorch/issues/38122.

monajalal commented 4 years ago

Could you please check https://github.com/daniilidis-group/neural_renderer/issues/92 and https://github.com/daniilidis-group/neural_renderer/issues/93

I was able to reproduce this error for two repos.

$ python
Python 3.7.6 (default, Jan  8 2020, 19:59:22) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.6.0'
>>> torch.version.cuda
'10.1'
>>> torch.cuda.is_available()
True

$ gcc --version
gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.1 LTS
Release:    20.04
Codename:   focal

monajalal commented 4 years ago

@goldsborough

here is the code I am trying to run:


"""
Example 1. Drawing a teapot from multiple viewpoints.
"""
import os
import argparse

import torch
import numpy as np
import tqdm
import imageio

import neural_renderer as nr

not sure it throws this error

(base) mona@mona:~/research/3danimals/neural_renderer/examples$ python example1.py 
Traceback (most recent call last):
  File "example1.py", line 12, in <module>
    import neural_renderer as nr
  File "/home/mona/anaconda3/lib/python3.7/site-packages/neural_renderer/__init__.py", line 3, in <module>
    from .load_obj import load_obj
  File "/home/mona/anaconda3/lib/python3.7/site-packages/neural_renderer/load_obj.py", line 8, in <module>
    import neural_renderer.cuda.load_textures as load_textures_cuda
ImportError: /home/mona/anaconda3/lib/python3.7/site-packages/neural_renderer/cuda/load_textures.cpython-37m-x86_64-linux-gnu.so: undefined symbol: THPVariableClass

https://github.com/daniilidis-group/neural_renderer/issues/93

pytorch / extension-cpp

undefined symbol: THPVariableClass #6