Undefined symbol: `cublasLtMatmulDescCreate` in fbgemm_gpu_experimental_gen_ai_py.so

jelmervdl commented 3 months ago

Hello,

I'm trying to add fbgemm_gpu to our env, but out of the box it doesn't seem to link against libcublas.so. If I run python with LD_PRELOAD=$CONDA_PREFIX/lib/libcublas.so python script.py it works as intended. Is there a link stage missing from cmake?

I've installed fbgemm_gpu in our local conda env more or less like this:

CUDA_VERSION=12.1
conda create python=3.10 ninja cmake
conda install pytorch pytorch-cuda=$CUDA_VERSION -c pytorch-nightly -c nvidia
conda run ... python setup.py install \
  -DTORCH_CUDA_ARCH_LIST="${CUDA_ARCH_LIST}" \
  --nccl_lib_path=$CONDA_DIR/lib/libnccl.so.2

q10 commented 3 months ago

We usually install the cuda package into the conda environment explicitly, instead of pytorch-cuda - this might be accounting for the environment setup difference that is causing libcublas.so to be not automatically found.

The full instructions for our environment setup can be found in here and here, and we have found this setup to be fairly reliable so far.

jianyuh commented 2 months ago

Hi @jelmervdl , does it work now with pip install fbgemm-gpu==0.8.0rc4 ?

Another solution is to add env var TORCH_USE_RTLD_GLOBAL=1 .

jelmervdl commented 2 months ago

Installing it through pip install fbgemm-gpu==0.8.0rc4 seems to work without anything else necessary.

pytorch / FBGEMM

Undefined symbol: `cublasLtMatmulDescCreate` in fbgemm_gpu_experimental_gen_ai_py.so #2808