microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.76k stars 163 forks source link

inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii #452

Open Andronixs opened 3 months ago

Andronixs commented 3 months ago

Environment: Ubuntu 22.04.4 LTS Cuda compilation tools, release 12.1, V12.1.66 Build cuda_12.1.r12.1/compiler.32415258_0 ds_report added at the end of the description

Issue: Not able to successfully run example scripts using MII. Getting the following error: inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii. However, I'm able to run the deepspeed inference directly (not using MII) without any issues. Tried different torch and cuda versions the result is the same.

Running the base example script: import mii pipe = mii.pipeline("mistralai/Mistral-7B-v0.1") response = pipe(["DeepSpeed is", "Seattle is"], max_new_tokens=128) print(response)

output .............................................................................. [10/10] c++ core_ops.o bias_activation.o bias_activation_cuda.cuda.o layer_norm.o layer_norm_cuda.cuda.o rms_norm.o rms_norm_cuda.cuda.o gated_activation_kernels.o gated_activation_kernels_cuda.cuda.o -shared -L/home/andrew/.local/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda-12.1/lib64 -lcudart -o inference_core_ops.so Loading extension module inference_core_ops... Traceback (most recent call last): File "/home/andrew/Projects/Deepspeed_examples/./ds_test.py", line 2, in pipe = mii.pipeline("mistralai/Mistral-7B-v0.1") File "/home/andrew/.local/lib/python3.10/site-packages/mii/api.py", line 207, in pipeline inference_engine = load_model(model_config) File "/home/andrew/.local/lib/python3.10/site-packages/mii/modeling/models.py", line 17, in load_model inference_engine = build_hf_engine( File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/engine_factory.py", line 129, in build_hf_engine return InferenceEngineV2(policy, engine_config) File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/engine_v2.py", line 83, in init self._model = self._policy.build_model(self._config, self._base_mp_group) File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 156, in build_model self.model = self.instantiate_model(engine_config, mp_group) File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/mistral/policy.py", line 17, in instantiate_model return MistralInferenceModel(config=self._model_config, engine_config=engine_config, base_mp_group=mp_group) File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 215, in init self.make_norm_layer() File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 518, in make_norm_layer self.norm = heuristics.instantiate_pre_norm(norm_config, self._engine_config) File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/modules/heuristics.py", line 167, in instantiate_pre_norm return DSPreNormRegistry.instantiate_config(config) File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/modules/module_registry.py", line 36, in instantiate_config if not target_implementation.supports_config(config_bundle.config): File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/modules/implementations/pre_norm/cuda_pre_rms.py", line 36, in supportsconfig = CUDARMSPreNorm(config.channels, config.residual_dtype) File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm/rms_norm_base.py", line 36, in init self.inf_module = InferenceCoreBuilder().load() File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 479, in load return self.jit_load(verbose) File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 523, in jit_load op_module = load(name=self.name, File "/home/andrew/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1306, in load return _jit_compile( File "/home/andrew/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1736, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "/home/andrew/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2132, in _import_module_from_library module = importlib.util.module_from_spec(spec) File "", line 571, in module_from_spec File "", line 1176, in create_module File "", line 241, in _call_with_frames_removed ImportError: /home/andrew/.cache/torch_extensions/py310_cu121/inference_core_ops/inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii

DS_REPORT: JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

async_io ............... [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [NO] ....... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] cpu_lion ............... [NO] ....... [OKAY] evoformer_attn ......... [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lion ............. [NO] ....... [OKAY] inference_core_ops ..... [NO] ....... [OKAY] cutlass_ops ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] ragged_device_ops ...... [NO] ....... [OKAY] ragged_ops ............. [NO] ....... [OKAY] random_ltd ............. [NO] ....... [OKAY] [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 [WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible sparse_attn ............ [NO] ....... [NO] spatial_inference ...... [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]

DeepSpeed general environment info: torch install path ............... ['/home/andrew/.local/lib/python3.10/site-packages/torch'] torch version .................... 2.2.2+cu121 deepspeed install path ........... ['/home/andrew/.local/lib/python3.10/site-packages/deepspeed'] deepspeed info ................... 0.14.0, unknown, unknown torch cuda version ............... 12.1 torch hip version ................ None nvcc version ..................... 12.1 deepspeed wheel compiled w. ...... torch 2.2, cuda 12.1 shared memory (/dev/shm) size .... 172.11 GB

allanj commented 3 months ago

same problem here

yechong316 commented 2 months ago

same to you, I have no way to solve it

Andronixs commented 2 months ago

If I'm using Conda and Python 3.9, I'm not getting this error, but the process is stuck in the server starting phase. MII_server_log

allanj commented 2 months ago

I simply change to VLLM.. sorry Microsoft :(

Andronixs commented 2 months ago

Yep, VLLM and HF TGI are working with no issues.

Andronixs commented 2 months ago

It seems this issue was previously reported under different titles:

https://github.com/microsoft/DeepSpeed-MII/issues/443

Fix the FP6 kernels compilation problem on non-Ampere GPUs. microsoft/DeepSpeed#5333

Proposed workaround: Downgrading to this will work: deepspeed 0.13.5 deepspeed-mii 0.2.2

Didn't work for me