microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.31k stars 4.09k forks source link

[BUG] Prebuilt `transformer_inference` op not linked against `libcurand` #2226

Closed vboginskey closed 2 years ago

vboginskey commented 2 years ago

Bug

This is related to:

When prebuilding the transformer-inference op, it is not properly linked against libcurand. This was fixed for JIT as part of https://github.com/microsoft/DeepSpeed/pull/1688, but it doesn't seem to work for prebuilding:

/tmp/DeepSpeed/deepspeed/ops/transformer/inference$ ldd transformer_inference_op.cpython-39-x86_64-linux-gnu.so | grep curand

[no output]

This manifests like so:

ImportError: /tmp/DeepSpeed/deepspeed/ops/transformer/inference/transformer_inference_op.cpython-39-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator

It seems no changes were made (outside of JIT) in either https://github.com/pytorch/pytorch/issues/69666 or https://github.com/microsoft/DeepSpeed/issues/1625. @stas00 reported prebuilding suddenly starting to work, but that doesn't seem to be the case in our environment.

Prebuilt OP

/tmp/DeepSpeed/deepspeed/ops/transformer/inference$ LD_LIBRARY_PATH=/usr/local/lib/python3.9/site-packages/torch/lib ldd transformer_inference_op.cpython-39-x86_64-linux-gnu.so 
        linux-vdso.so.1 (0x00007ffde01cf000)
        libc10.so => /usr/local/lib/python3.9/site-packages/torch/lib/libc10.so (0x00007fd10840a000)
        libtorch_cpu.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so (0x00007fd0ee399000)
        libtorch_python.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_python.so (0x00007fd0ed45e000)
        libcudart.so.11.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007fd0ed1ae000)
        libc10_cuda.so => /usr/local/lib/python3.9/site-packages/torch/lib/libc10_cuda.so (0x00007fd0ed0b0000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fd0ecee3000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd0ecec7000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd0ecd02000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd0ecbbe000)
        libgomp-a34b3233.so.1 => /usr/local/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1 (0x00007fd0ec994000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd0ec972000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fd1085aa000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd0ec967000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd0ec95f000)
        libcudart-45da57e3.so.11.0 => /usr/local/lib/python3.9/site-packages/torch/lib/libcudart-45da57e3.so.11.0 (0x00007fd0ec6b7000)
        libshm.so => /usr/local/lib/python3.9/site-packages/torch/lib/libshm.so (0x00007fd0ec6ad000)
        libtorch.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch.so (0x00007fd0ec6a8000)
        libtorch_cuda.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so (0x00007fd0ec689000)
        libtorch_cuda_cpp.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so (0x00007fd0df3e9000)
        libnvToolsExt-847d78f2.so.1 => /usr/local/lib/python3.9/site-packages/torch/lib/libnvToolsExt-847d78f2.so.1 (0x00007fd0df1de000)
        libcudnn.so.8 => /usr/local/lib/python3.9/site-packages/torch/lib/libcudnn.so.8 (0x00007fd0defb6000)
        libtorch_cuda_cu.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so (0x00007fd0b756f000)
        libcublas.so.11 => /usr/local/lib/python3.9/site-packages/torch/lib/libcublas.so.11 (0x00007fd0addf1000)
        libcublasLt.so.11 => /usr/local/lib/python3.9/site-packages/torch/lib/libcublasLt.so.11 (0x00007fd098d89000)
/tmp/DeepSpeed/deepspeed/ops/transformer/inference$ nm transformer_inference_op.cpython-39-x86_64-linux-gnu.so | grep curand
                 U curandCreateGenerator
                 U curandSetPseudoRandomGeneratorSeed

JIT Op

~/.cache/torch_extensions/py39_cu116/transformer_inference$ LD_LIBRARY_PATH=/usr/local/lib/python3.9/site-packages/torch/lib ldd transformer_inference.so 
        linux-vdso.so.1 (0x00007ffdf6ecc000)
        libcurand.so.10 => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10 (0x00007f02ce0f2000)
        libc10.so => /usr/local/lib/python3.9/site-packages/torch/lib/libc10.so (0x00007f02ce059000)
        libc10_cuda.so => /usr/local/lib/python3.9/site-packages/torch/lib/libc10_cuda.so (0x00007f02cdf5b000)
        libtorch_cpu.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so (0x00007f02b3eea000)
        libtorch_python.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_python.so (0x00007f02b2faf000)
        libcudart.so.11.0 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007f02b2d0b000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f02b2b3c000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f02b2b22000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f02b295d000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f02b2952000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f02b2930000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f02b292a000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f02b27e4000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f02d3d6b000)
        libgomp-a34b3233.so.1 => /usr/local/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1 (0x00007f02b25ba000)
        libcudart-45da57e3.so.11.0 => /usr/local/lib/python3.9/site-packages/torch/lib/libcudart-45da57e3.so.11.0 (0x00007f02b2312000)
        libshm.so => /usr/local/lib/python3.9/site-packages/torch/lib/libshm.so (0x00007f02b2308000)
        libtorch.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch.so (0x00007f02b2303000)
        libtorch_cuda.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so (0x00007f02b22e2000)
        libtorch_cuda_cpp.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so (0x00007f02a5044000)
        libnvToolsExt-847d78f2.so.1 => /usr/local/lib/python3.9/site-packages/torch/lib/libnvToolsExt-847d78f2.so.1 (0x00007f02a4e39000)
        libcudnn.so.8 => /usr/local/lib/python3.9/site-packages/torch/lib/libcudnn.so.8 (0x00007f02a4c11000)
        libtorch_cuda_cu.so => /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so (0x00007f027d1ca000)
        libcublas.so.11 => /usr/local/lib/python3.9/site-packages/torch/lib/libcublas.so.11 (0x00007f0273a4a000)
        libcublasLt.so.11 => /usr/local/lib/python3.9/site-packages/torch/lib/libcublasLt.so.11 (0x00007f025e9e4000)
/.cache/torch_extensions/py39_cu116/transformer_inference$ nm transformer_inference.so | grep curand
                 U curandCreateGenerator@libcurand.so.10
                 U curandSetPseudoRandomGeneratorSeed@libcurand.so.10

Environment

Torch

$ python -m torch.utils.collect_envCollecting environment information...
PyTorch version: 1.12.1+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 11 (bullseye) (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: version 3.18.4
Libc version: glibc-2.31

Python version: 3.9.10 (main, Mar  2 2022, 04:23:34)  [GCC 10.2.1 20210110] (64-bit runtime)
Python platform: Linux-4.14.281-212.502.amzn2.x86_64-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.2
[pip3] torch==1.12.1+cu116
[pip3] torchinfo==1.7.0
[pip3] torchvision==0.13.1+cu116
[conda] Could not collect

Deepspeed

$ ds_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [YES] ...... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.9/site-packages/torch']
torch version .................... 1.12.1+cu116
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed install path ........... ['/tmp/DeepSpeed/deepspeed']
deepspeed info ................... 0.7.1+9b418c1e, 9b418c1e, master
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.6

Reproduction

  1. Install torch:

    pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116
  2. Install DeepSpeed (devel install):

    git clone https://github.com/microsoft/DeepSpeed
    cd DeepSpeed
    DS_BUILD_TRANSFORMER_INFERENCE=1 pip install -e . --global-option="build_ext" --global-option="-j4" --no-cache -v --disable-pip-version-check
  3. Try to use inference:

    
    import deepspeed
    import torch
    from transformers import AutoModel

model = AutoModel.from_pretrained("bert-base-cased")

Initialize the DeepSpeed-Inference engine

ds_engine = deepspeed.init_inference(model, mp_size=1, dtype=torch.half, replace_method='auto', replace_with_kernel_inject=True) model = ds_engine.module


### Error

Traceback (most recent call last): File "/tmp/test2.py", line 8, in ds_engine = deepspeed.init_inference(model, File "/tmp/DeepSpeed/deepspeed/init.py", line 292, in init_inference engine = InferenceEngine(model, File "/tmp/DeepSpeed/deepspeed/inference/engine.py", line 140, in init self._apply_injection_policy( File "/tmp/DeepSpeed/deepspeed/inference/engine.py", line 333, in _apply_injection_policy replace_transformer_layer( File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 771, in replace_transformer_layer replaced_module = replace_module(model=model, File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 954, in replace_module replacedmodule, = _replace_module(model, policy) File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 981, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 981, in _replacemodule , layer_id = _replace_module(child, policies, layer_id=layer_id) File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 971, in _replace_module replaced_module = policies[child.class][0](child, File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 761, in replace_fn new_module = replace_with_policy(child, File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 359, in replace_with_policy new_module = transformer_inference.DeepSpeedTransformerInference( File "/tmp/DeepSpeed/deepspeed/ops/transformer/inference/transformer_inference.py", line 774, in init inference_cuda_module = builder.load() File "/tmp/DeepSpeed/deepspeed/ops/op_builder/builder.py", line 468, in load return importlib.import_module(self.absolute_name()) File "/usr/local/lib/python3.9/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1030, in _gcd_import File "", line 1007, in _find_and_load File "", line 986, in _find_and_load_unlocked File "", line 666, in _load_unlocked File "", line 565, in module_from_spec File "", line 1173, in create_module File "", line 228, in _call_with_frames_removed ImportError: /tmp/DeepSpeed/deepspeed/ops/transformer/inference/transformer_inference_op.cpython-39-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator

jeffra commented 2 years ago

@vboginskey thank you for reporting this issue, can you confirm if #2228 fixes your issue?

vboginskey commented 2 years ago

@jeffra 🎉 yes, it does. Thanks for the quick response.

jeffra commented 2 years ago

Wonderful! :)