state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
12.73k stars 1.07k forks source link

AMD GPU: AttributeError: 'HIPDriver' object has no attribute 'get_current_device' #429

Open eliranwong opened 3 months ago

eliranwong commented 3 months ago

My setup: Dual AMD RX 7900 XTX + ROCm 6.1.3; full setup recorded at https://github.com/eliranwong/MultiAMDGPU_AIDev_Ubuntu

I following the fix at https://github.com/state-spaces/mamba/issues/412 to install mamba:

pip install packaging
pip install git+https://github.com/state-spaces/mamba.git

I ran the following command suggested in your repo, but encountered errors:

python3 benchmark_generation_mamba_simple.py --model-name "state-spaces/mamba2-2.7b" --prompt "My cat wrote all this CUDA code for a new language model and" --topp 0.9 --temperature 0.7 --repetition-penalty 1.2

Full log of the errors:

/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/mamba_ssm/ops/triton/layer_norm.py:147: UserWarning: 'torch._C._CudaDeviceProperties' object has no attribute 'gcnArchName', warp size set to 32 based on device name: Radeon RX 7900 XTX
  warnings.warn(f"{e}, warp size set to {warp_size} based on device name: {device_name}", UserWarning)
Loading model state-spaces/mamba2-2.7b
tokenizer_config.json: 100%|███████████████████| 156/156 [00:00<00:00, 2.39MB/s]
vocab.json: 100%|██████████████████████████| 1.08M/1.08M [00:00<00:00, 3.28MB/s]
merges.txt: 100%|████████████████████████████| 457k/457k [00:00<00:00, 1.98MB/s]
tokenizer.json: 100%|██████████████████████| 2.11M/2.11M [00:00<00:00, 4.89MB/s]
special_tokens_map.json: 100%|███████████████| 90.0/90.0 [00:00<00:00, 1.25MB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
config.json: 100%|█████████████████████████████| 331/331 [00:00<00:00, 2.83MB/s]
pytorch_model.bin: 100%|███████████████████| 5.41G/5.41G [05:43<00:00, 15.7MB/s]
Number of parameters: 2702599680
Traceback (most recent call last):
  File "/home/ubuntu/eliran/Downloads/benchmark_generation_mamba_simple.py", line 82, in <module>
    out = fn()
  File "/home/ubuntu/eliran/Downloads/benchmark_generation_mamba_simple.py", line 56, in <lambda>
    fn = lambda: model.generate(
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/mamba_ssm/utils/generation.py", line 260, in generate
    output = decode(
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/mamba_ssm/utils/generation.py", line 160, in decode
    model._decoding_cache = update_graph_cache(
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/mamba_ssm/utils/generation.py", line 321, in update_graph_cache
    cache.callables[batch_size, decoding_seqlen] = capture_graph(
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/mamba_ssm/utils/generation.py", line 355, in capture_graph
    logits = model(
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/mamba_ssm/models/mixer_seq_simple.py", line 279, in forward
    hidden_states = self.backbone(input_ids, inference_params=inference_params, **mixer_kwargs)
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/mamba_ssm/models/mixer_seq_simple.py", line 194, in forward
    hidden_states, residual = layer(
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/mamba_ssm/modules/block.py", line 57, in forward
    hidden_states, residual = layer_norm_fn(
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/mamba_ssm/ops/triton/layer_norm.py", line 902, in layer_norm_fn
    return LayerNormFn.apply(
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/mamba_ssm/ops/triton/layer_norm.py", line 775, in forward
    y, y1, mean, rstd, residual_out, seeds, dropout_mask, dropout_mask1 = _layer_norm_fwd(
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/mamba_ssm/ops/triton/layer_norm.py", line 369, in _layer_norm_fwd
    _layer_norm_fwd_1pass_kernel[(M,)](
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 143, in run
    timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 143, in <dictcomp>
    timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 122, in _bench
    return do_bench(kernel_call, warmup=self.warmup, rep=self.rep, quantiles=(0.5, 0.2, 0.8))
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/triton/testing.py", line 102, in do_bench
    fn()
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 110, in kernel_call
    self.fn.run(
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 305, in run
    return self.fn.run(*args, **kwargs)
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 305, in run
    return self.fn.run(*args, **kwargs)
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 305, in run
    return self.fn.run(*args, **kwargs)
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/triton/runtime/jit.py", line 363, in run
    device = driver.get_current_device()
  File "/home/ubuntu/eliran/ai/cupyrocm/lib/python3.10/site-packages/triton/runtime/driver.py", line 210, in __getattr__
    return getattr(self._obj, name)
AttributeError: 'HIPDriver' object has no attribute 'get_current_device'
ajassani commented 3 months ago

The autotune feature of Triton is sensitive to the Triton version. Have you tried the rocm/pytorch [https://hub.docker.com/r/rocm/pytorch ] docker image? It comes preinstalled with the appropriate Triton version

eliranwong commented 3 months ago

I run ROCm 6.1.3 https://rocm.docs.amd.com/projects/radeon/en/latest/index.html, which officially supports AMD Radeon™ 7000 series GPUs. I think docker is running an older version, so it is not a choice for me.

with pytorch-triton-rocm 2.1.0+rocm6.1.3.4d510c3a44, officially provided by AMD:

https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/install-pytorch.html

https://repo.radeon.com/rocm/manylinux/rocm-rel-6.1.3/