xnd-project / libxnd

Subsumed into xnd
https://xnd.io/
BSD 3-Clause "New" or "Revised" License
80 stars 12 forks source link

Importing fails on macOS + CUDA #43

Closed hameerabbasi closed 5 years ago

hameerabbasi commented 5 years ago

It seems the linking isn't done correctly:

>>> from xnd import xnd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hameerabbasi/Quansight/plures/gumath/python/xnd/__init__.py", line 49, in <module>
    from ._xnd import Xnd, XndEllipsis, data_shapes, _typeof
ImportError: dlopen(/Users/hameerabbasi/Quansight/plures/gumath/python/xnd/_xnd.cpython-37m-darwin.so, 2): Symbol not found: ___cudaRegisterFatBinary
  Referenced from: /Users/hameerabbasi/Quansight/plures/gumath/python/xnd/libxnd.0.dylib
  Expected in: flat namespace
 in /Users/hameerabbasi/Quansight/plures/gumath/python/xnd/libxnd.0.dylib

Running otool suggests it isn't even linked against CUDA:

$ otool -L xnd/_xnd.cpython-37m-darwin.so
xnd/_xnd.cpython-37m-darwin.so:
    @rpath/libndtypes.0.dylib (compatibility version 0.2.0, current version 0.2.0)
    @rpath/libxnd.0.dylib (compatibility version 0.2.0, current version 0.2.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.50.4)
pearu commented 5 years ago

Linking against cuda library is not implemented, see https://github.com/plures/xnd/blob/c85da07a36a59d07eedda6fb3607a7afa4354b39/setup.py#L275

hameerabbasi commented 5 years ago

This is just a fresh clone and build... No changes were made. Merely having the CUDA toolkit installed breaks the build.

skrah commented 5 years ago

On Wed, Feb 13, 2019 at 10:32:23AM -0800, Pearu Peterson wrote:

Linking against cuda library is not implemented, see https://github.com/plures/xnd/blob/c85da07a36a59d07eedda6fb3607a7afa4354b39/setup.py#L275

Yes, the -lcudart passage for Linux should be in the path for both Linux and OS X.

I didn't put it there because I cannot test on OS X and the buildbots don't have CUDA.

hameerabbasi commented 5 years ago

@skrah I believe that merely having nvcc installed triggers some code paths that link/compile some CUDA code, and this breaks the build.

If you need access to my machine for testing via SSH, I'm perfectly willing to give it to you. Or I can try to take a crack at this later myself.

skrah commented 5 years ago

Thanks for the offer, I think it should be fixed in the latest revision, unless the paths for CUDA are different on OS X.

hameerabbasi commented 5 years ago

They're in /usr/cuda/lib instead of /usr/cuda/lib64.

skrah commented 5 years ago

On Thu, Feb 14, 2019 at 05:23:56PM +0000, Hameer Abbasi wrote:

They're in /usr/cuda/lib instead of /usr/cuda/lib64.

Hm, then it's probably time for a proper ./configure check for the libraries. Also I added all arches above 6.0 to the build, which slows down compile times. This should also be in ./configure.

hameerabbasi commented 5 years ago

I think @pearu fixed a similar issue in PyTorch, maybe he could comment.

pearu commented 5 years ago

On OSX (sys.platform=='darwin'), CUDA libraries are in /usr/local/cuda/lib/, see here.

skrah commented 5 years ago

Ok, I just added /usr/cuda/lib and /usr/local/cuda/lib/ to the search paths.

I also added a CUDA arch detection to ./configure, because code generation for all arches doubles the compile time.

Let me know if ./configure fails. On my system I can just call nvcc without any include or library paths for simple programs.

skrah commented 5 years ago

No, actually /usr/local/cuda-9.2/lib64 is in my LD_LIBRARY_PATH. Back to ./configure...

skrah commented 5 years ago

It also works without LD_LIBRARY_PATH set.

hameerabbasi commented 5 years ago

It still fails on master with the same issue.

skrah commented 5 years ago

This one is fixed.