When prebuilding the transformer-inference op, it is not properly linked against libcurand. This was fixed for JIT as part of https://github.com/microsoft/DeepSpeed/pull/1688, but it doesn't seem to work for prebuilding:
/tmp/DeepSpeed/deepspeed/ops/transformer/inference$ nm transformer_inference_op.cpython-39-x86_64-linux-gnu.so | grep curand
U curandCreateGenerator
U curandSetPseudoRandomGeneratorSeed
/.cache/torch_extensions/py39_cu116/transformer_inference$ nm transformer_inference.so | grep curand
U curandCreateGenerator@libcurand.so.10
U curandSetPseudoRandomGeneratorSeed@libcurand.so.10
Environment
Torch
$ python -m torch.utils.collect_envCollecting environment information...
PyTorch version: 1.12.1+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 11 (bullseye) (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: version 3.18.4
Libc version: glibc-2.31
Python version: 3.9.10 (main, Mar 2 2022, 04:23:34) [GCC 10.2.1 20210110] (64-bit runtime)
Python platform: Linux-4.14.281-212.502.amzn2.x86_64-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.23.2
[pip3] torch==1.12.1+cu116
[pip3] torchinfo==1.7.0
[pip3] torchvision==0.13.1+cu116
[conda] Could not collect
Deepspeed
$ ds_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [YES] ...... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.9/site-packages/torch']
torch version .................... 1.12.1+cu116
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed install path ........... ['/tmp/DeepSpeed/deepspeed']
deepspeed info ................... 0.7.1+9b418c1e, 9b418c1e, master
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.6
import deepspeed
import torch
from transformers import AutoModel
model = AutoModel.from_pretrained("bert-base-cased")
Initialize the DeepSpeed-Inference engine
ds_engine = deepspeed.init_inference(model,
mp_size=1,
dtype=torch.half,
replace_method='auto',
replace_with_kernel_inject=True)
model = ds_engine.module
### Error
Traceback (most recent call last):
File "/tmp/test2.py", line 8, in
ds_engine = deepspeed.init_inference(model,
File "/tmp/DeepSpeed/deepspeed/init.py", line 292, in init_inference
engine = InferenceEngine(model,
File "/tmp/DeepSpeed/deepspeed/inference/engine.py", line 140, in init
self._apply_injection_policy(
File "/tmp/DeepSpeed/deepspeed/inference/engine.py", line 333, in _apply_injection_policy
replace_transformer_layer(
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 771, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 954, in replace_module
replacedmodule, = _replace_module(model, policy)
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 981, in _replacemodule
, layer_id = _replace_module(child, policies, layer_id=layer_id)
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 981, in _replacemodule
, layer_id = _replace_module(child, policies, layer_id=layer_id)
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 971, in _replace_module
replaced_module = policies[child.class][0](child,
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 761, in replace_fn
new_module = replace_with_policy(child,
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 359, in replace_with_policy
new_module = transformer_inference.DeepSpeedTransformerInference(
File "/tmp/DeepSpeed/deepspeed/ops/transformer/inference/transformer_inference.py", line 774, in init
inference_cuda_module = builder.load()
File "/tmp/DeepSpeed/deepspeed/ops/op_builder/builder.py", line 468, in load
return importlib.import_module(self.absolute_name())
File "/usr/local/lib/python3.9/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 986, in _find_and_load_unlocked
File "", line 666, in _load_unlocked
File "", line 565, in module_from_spec
File "", line 1173, in create_module
File "", line 228, in _call_with_frames_removed
ImportError: /tmp/DeepSpeed/deepspeed/ops/transformer/inference/transformer_inference_op.cpython-39-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator
Bug
This is related to:
When prebuilding the transformer-inference op, it is not properly linked against libcurand. This was fixed for JIT as part of https://github.com/microsoft/DeepSpeed/pull/1688, but it doesn't seem to work for prebuilding:
This manifests like so:
It seems no changes were made (outside of JIT) in either https://github.com/pytorch/pytorch/issues/69666 or https://github.com/microsoft/DeepSpeed/issues/1625. @stas00 reported prebuilding suddenly starting to work, but that doesn't seem to be the case in our environment.
Prebuilt OP
JIT Op
Environment
Torch
Deepspeed
Reproduction
Install
torch
:Install DeepSpeed (devel install):
Try to use inference:
model = AutoModel.from_pretrained("bert-base-cased")
Initialize the DeepSpeed-Inference engine
ds_engine = deepspeed.init_inference(model, mp_size=1, dtype=torch.half, replace_method='auto', replace_with_kernel_inject=True) model = ds_engine.module
Traceback (most recent call last): File "/tmp/test2.py", line 8, in
ds_engine = deepspeed.init_inference(model,
File "/tmp/DeepSpeed/deepspeed/init.py", line 292, in init_inference
engine = InferenceEngine(model,
File "/tmp/DeepSpeed/deepspeed/inference/engine.py", line 140, in init
self._apply_injection_policy(
File "/tmp/DeepSpeed/deepspeed/inference/engine.py", line 333, in _apply_injection_policy
replace_transformer_layer(
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 771, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 954, in replace_module
replacedmodule, = _replace_module(model, policy)
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 981, in _replacemodule
, layer_id = _replace_module(child, policies, layer_id=layer_id)
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 981, in _replacemodule
, layer_id = _replace_module(child, policies, layer_id=layer_id)
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 971, in _replace_module
replaced_module = policies[child.class][0](child,
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 761, in replace_fn
new_module = replace_with_policy(child,
File "/tmp/DeepSpeed/deepspeed/module_inject/replace_module.py", line 359, in replace_with_policy
new_module = transformer_inference.DeepSpeedTransformerInference(
File "/tmp/DeepSpeed/deepspeed/ops/transformer/inference/transformer_inference.py", line 774, in init
inference_cuda_module = builder.load()
File "/tmp/DeepSpeed/deepspeed/ops/op_builder/builder.py", line 468, in load
return importlib.import_module(self.absolute_name())
File "/usr/local/lib/python3.9/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 986, in _find_and_load_unlocked
File "", line 666, in _load_unlocked
File "", line 565, in module_from_spec
File "", line 1173, in create_module
File "", line 228, in _call_with_frames_removed
ImportError: /tmp/DeepSpeed/deepspeed/ops/transformer/inference/transformer_inference_op.cpython-39-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator