RuntimeError('FusedLayerNormAffineFunction requires cuda extensions')

I got the error when I tried to use opt-125m. The env details are:

NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8
torch 1.13.0

Full error message

On WorkerInfo(id=0, name=wok0):
RuntimeError('FusedLayerNormAffineFunction requires cuda extensions')
Traceback (most recent call last):
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/colossalai/kernel/cuda_native/layer_norm.py", line 19, in forward
    import colossalai._C.layer_norm
ImportError: /root/.conda/envs/llm/lib/python3.8/site-packages/colossalai/_C/layer_norm.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops19empty_memory_format4callEN3c108ArrayRefIlEENS2_8optionalINS2_10ScalarTypeEEENS5_INS2_6LayoutEEENS5_INS2_6DeviceEEENS5_IbEENS5_INS2_12MemoryFormatEEE

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function
    result = python_udf.func(*python_udf.args, **python_udf.kwargs)
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/energonai/engine/rpc_utils.py", line 8, in call_method
    return method(rref.local_value(), *args, **kwargs)
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/energonai/engine/rpc_worker.py", line 118, in run
    output, cur_key = self.model.run(key, inputs)
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/energonai/engine/pipeline_wrapper.py", line 72, in run
    return self.run_without_pp(key, inputs)
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/energonai/engine/pipeline_wrapper.py", line 86, in run_without_pp
    output = self.model(hidden_states=None, **sample)
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/energonai/model/model_factory.py", line 114, in forward
    hidden_states = block(hidden_states=hidden_states,
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/energonai/model/endecoder.py", line 52, in forward
    hidden_states = self.norm1(hidden_states)
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/colossalai/nn/layer/colossalai_layer/_utils.py", line 38, in forward
    return self._forward_func(*args)
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/colossalai/kernel/cuda_native/layer_norm.py", line 73, in forward
    return FusedLayerNormAffineFunction.apply(input, self.weight, self.bias, self.normalized_shape, self.eps)
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 105, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/root/.conda/envs/llm/lib/python3.8/site-packages/colossalai/kernel/cuda_native/layer_norm.py", line 21, in forward
    raise RuntimeError('FusedLayerNormAffineFunction requires cuda extensions')
RuntimeError: FusedLayerNormAffineFunction requires cuda extensions

hpcaitech / EnergonAI

RuntimeError('FusedLayerNormAffineFunction requires cuda extensions') #174