Error when trying to use Mamba2

yxchng commented 3 months ago

Traceback (most recent call last):
  File "test_mambav2.py", line 6, in <module>
    from mamba_ssm import Mamba
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/__init__.py", line 5, in <module>
    from mamba_ssm.modules.mamba2 import Mamba2
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/modules/mamba2.py", line 23, in <module>
    from mamba_ssm.distributed.tensor_parallel import ColumnParallelLinear, RowParallelLinear
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/distributed/tensor_parallel.py", line 14, in <module>
    from src.distributed.distributed_utils import (
ModuleNotFoundError: No module named 'src

tridao commented 3 months ago

Thanks i've just fixed it

wjq-learning commented 3 months ago

Will you provide the whl file of ver2.0.1?

tridao commented 3 months ago

it's compiling

wjq-learning commented 3 months ago

Thank you very much!

yxchng commented 3 months ago

tried pulling latest version but still missing imports

Traceback (most recent call last):
  File "test_mambav2.py", line 6, in <module>
    from mamba_ssm import Mamba
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/__init__.py", line 5, in <module>
    from mamba_ssm.modules.mamba2 import Mamba2
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/modules/mamba2.py", line 26, in <module>
    from mamba_ssm.ops.triton.ssd_combined import mamba_chunk_scan_combined
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 40, in <module>
    from mamba_ssm.ops.triton.k_activations import _swiglu_fwd, _swiglu_bwd
ModuleNotFoundError: No module named 'mamba_ssm.ops.triton.k_activations'

tridao commented 3 months ago

thanks i've updated

yxchng commented 3 months ago

Your filename is wrongly named as k_activation.py when it is importing k_activations. However, I am still facing issue after fixing that. The code works with Mamba v1.

Traceback (most recent call last):
  File "test_mambav2.py", line 266, in <module>
    y = mamba(x.unsqueeze(0))
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/modules/mamba2.py", line 176, in forward
    out = mamba_split_conv1d_scan_combined(
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 908, in mamba_split_conv1d_scan_combined
    return MambaSplitConv1dScanCombinedFn.apply(zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states, seq_idx, dt_limit, return_final_states, activation, rmsnorm_weight, rmsnorm_eps, outproj_weight, outproj_bias, headdim, ngroups, norm_before_gate)
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 115, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 757, in forward
    causal_conv1d_cuda.causal_conv1d_fwd(rearrange(xBC, "b s d -> b d s"),
RuntimeError: causal_conv1d with channel last layout requires strides (x.stride(0) and x.stride(2)) to be multiples of 8

tridao commented 3 months ago

Try making the dimension larger (e.g. multiple of 512).

yxchng commented 3 months ago

must the dimension be large? my network channels are just 32,64,128,256 and 512.

tridao commented 3 months ago

Idk i haven't tested anything not multiple of 512. You can also not use the conv1d package (uninstalling it should mean the mamba code uses torch nn.Conv1d).

Madiwka4 commented 3 months ago

I believe this could be an error stemming form the same issue?

38 from mamba_ssm.ops.triton.ssd_chunk_scan import _chunk_scan_bwd_ddAcs_prev 39 from mamba_ssm.ops.triton.layernorm_gated import rmsnorm_fn, _layer_norm_fwd, _layer_norm_bwd ---> 40 from mamba_ssm.ops.triton.k_activations import _swiglu_fwd, _swiglu_bwd 42 TRITON_22 = version.parse(triton.version) >= version.parse('2.2.0') 45 def init_to_zero(names):

ModuleNotFoundError: No module named 'mamba_ssm.ops.triton.k_activations'

This is what happens when I try to import Mamba2.

For some reason, the ops/triton/k_activations.py file is not present in the pip release, or it could be an issue on my end. Pasting it in manually solved the issue

JunMa11 commented 3 months ago

Idk i haven't tested anything not multiple of 512. You can also not use the conv1d package (uninstalling it should mean the mamba code uses torch nn.Conv1d).

Hi @tridao ,

Thanks for your swift reply. I got the following error without using the conv1d package (can be solved by installing the conv1d package ). Any comments are highly appreciated.

Traceback (most recent call last):
  File "/home/jma/Documents/mamba/debug_mamba2.py", line 12, in <module>
    y = model(x)
  File "/home/jma/anaconda3/envs/mamba2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/jma/anaconda3/envs/mamba2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jma/Documents/mamba/mamba_ssm/modules/mamba2.py", line 176, in forward
    out = mamba_split_conv1d_scan_combined(
  File "/home/jma/Documents/mamba/mamba_ssm/ops/triton/ssd_combined.py", line 908, in mamba_split_conv1d_scan_combined
    return MambaSplitConv1dScanCombinedFn.apply(zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states, seq_idx, dt_limit, return_final_states, activation, rmsnorm_weight, rmsnorm_eps, outproj_weight, outproj_bias, headdim, ngroups, norm_before_gate)
  File "/home/jma/anaconda3/envs/mamba2/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/jma/anaconda3/envs/mamba2/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 115, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/home/jma/Documents/mamba/mamba_ssm/ops/triton/ssd_combined.py", line 757, in forward
    causal_conv1d_cuda.causal_conv1d_fwd(rearrange(xBC, "b s d -> b d s"),
AttributeError: 'NoneType' object has no attribute 'causal_conv1d_fwd'

JunMa11 commented 3 months ago

Idk i haven't tested anything not multiple of 512. You can also not use the conv1d package (uninstalling it should mean the mamba code uses torch nn.Conv1d).

I've tried with dim=1024 (a multiple of 512) and got the following error:

 File "/home/ubuntu/miniconda3/envs/venv/lib/python3.10/site-packages/mamba_ssm/modules/mamba2.py", line 176, in forward
    out = mamba_split_conv1d_scan_combined(
  File "/home/ubuntu/miniconda3/envs/venv/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 908, in mamba_split_conv1d_scan_combined
    return MambaSplitConv1dScanCombinedFn.apply(zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states, seq_idx, dt_limit, return_final_states, activation, rmsnorm_weight, rmsnorm_eps, outproj_weight, outproj_bias, headdim, ngroups, norm_before_gate)
  File "/home/ubuntu/miniconda3/envs/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/ubuntu/miniconda3/envs/venv/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 113, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/venv/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 757, in forward
    causal_conv1d_cuda.causal_conv1d_fwd(rearrange(xBC, "b s d -> b d s"),
RuntimeError: causal_conv1d only supports channel dimension divisible by 8 for now

Hi @hanif-rt , the following script works:

from mamba_ssm import Mamba2
# from mamba_ssm.modules.mamba2_simple import Mamba2Simple as Mamba2
import torch
batch, length, dim = 2, 64, 1024
x = torch.randn(batch, length, dim).to("cuda")
model = Mamba2(
    # This module uses roughly 3 * expand * d_model^2 parameters
    d_model=dim, # Model dimension d_model
    d_state=64,  # SSM state expansion factor, typically 64 or 128
    d_conv=4,    # Local convolution width
    expand=2,    # Block expansion factor
    headdim=128
).to("cuda")
y = model(x)
assert y.shape == x.shape
print("Mamba2 model parameters:", sum(p.numel() for p in model.parameters() if p.requires_grad))
print('x.shape:', x.shape, 'y.shape:', y.shape)

gaotai8 commented 3 months ago

我认为这可能是由同一问题引起的错误？

38从 mamba_ssm.ops.triton.ssd_chunk_scan 导入 _chunk_scan_bwd_ddAcs_prev 39从 mamba_ssm.ops.triton.layernorm_gated 导入 rmsnorm_fn、_layer_norm_fwd、_layer_norm_bwd ---> 40从 mamba_ssm.ops.triton.k_activations 导入 _swiglu_fwd、_swiglu_bwd 42 TRITON_22 = version.parse(triton. version ) >= version.parse('2.2.0') 45 def init_to_zero(names):

ModuleNotFoundError：没有名为“mamba_ssm.ops.triton.k_activations”的模块

当我尝试导入 Mamba2 时发生了这种情况。

由于某种原因，pip 版本中不存在 ops/triton/k_activations.py 文件，或者这可能是我的问题。手动粘贴解决了这个问题

how did you solve it?please!

hanif-rt commented 3 months ago

我认为这可能是由同一问题引起的错误？

38从 mamba_ssm.ops.triton.ssd_chunk_scan 导入 _chunk_scan_bwd_ddAcs_prev 39从 mamba_ssm.ops.triton.layernorm_gated 导入 rmsnorm_fn、_layer_norm_fwd、_layer_norm_bwd ---> 40从 mamba_ssm.ops.triton.k_activations 导入 _swiglu_fwd、_swiglu_bwd 42 TRITON_22 = version.parse(triton. version ) >= version.parse('2.2.0') 45 def init_to_zero(names):

ModuleNotFoundError：没有名为“mamba_ssm.ops.triton.k_activations”的模块当我尝试导入 Mamba2 时发生了这种情况。由于某种原因，pip 版本中不存在 ops/triton/k_activations.py 文件，或者这可能是我的问题。手动粘贴解决了这个问题

how did you solve it?please!

pypi version is 2.0.1 (i.e when you pip install). The problem was fixed on 2.0.3, which will take a while to propagate to pypi. Either try it with the wheel, given in the github release, or you might have to wait until 2.0.3 finish uploading to pypi

tyshiwo1 commented 3 months ago

Hello, I encounter the following error:

triton.compiler.errors.CompilationError: at 31:56:    pid_h = tl.program_id(axis=2)
    dt_ptr += pid_b * stride_dt_batch + pid_c * chunk_size * stride_dt_seqlen
    dt_out_ptr += pid_b * stride_dt_out_batch + pid_c * stride_dt_out_chunk
    dA_cumsum_ptr += pid_b * stride_dA_cs_batch + pid_c * stride_dA_cs_chunk

    offs_h = pid_h * BLOCK_SIZE_H + tl.arange(0, BLOCK_SIZE_H)
    offs_c = tl.arange(0, BLOCK_SIZE_CHUNK)
    dt_ptrs = dt_ptr + (offs_h[:, None] * stride_dt_head + offs_c[None, :] * stride_dt_seqlen)
    A_ptrs = A_ptr + offs_h * stride_A_head
    dt_out_ptrs = dt_out_ptr + (offs_h[:, None] * stride_dt_out_head + offs_c[None, :] * stride_dt_out_csize)
    dA_cs_ptrs = dA_cumsum_ptr + (offs_h[:, None] * stride_dA_cs_head + offs_c[None, :] * stride_dA_cs_csize)
    chunk_size_limit = min(chunk_size, seqlen - pid_c * chunk_size)
                                                        ^
IncompatibleTypeErrorImpl('invalid operands of type pointer<int64> and triton.language.int32')

when calling:

File "/home/mamba_attn/mamba_ssm/ops/triton/ssd_chunk_state.py", line 582, in _chunk_cumsum_fwd
    _chunk_cumsum_fwd_kernel[grid_chunk_cs](

My triton version is 2.2.0

gaotai8 commented 3 months ago

我认为这可能是由于同一问题引起的错误？

38从 mamba_ssm.ops.triton.ssd_chunk_scan 导入 _chunk_scan_bwd_ddAcs_prev 39从 mamba_ssm.ops.triton.layernorm_gated 导入 rmsnorm_fn、_layer_norm_fwd、_layer_norm_bwd ---> 40从 mamba_ssm.ops.triton.k_activations 导入 _swiglu_fwd、_swiglu_bwd 42 TRITON_22 = version.parse(triton. version ) >= version.parse('2.2.0') 45 def init_to_zero(names):

ModuleNotFoundError：没有名为“mamba_ssm.ops.triton.k_activations”的模块当我尝试导入 Mamba2 时发生了这种情况。由于某种原因，pip 版本中不存在 ops/triton/k_activations.py 文件，或者这可能是我的一个问题。手动粘贴解决了这个问题

请问你是怎么解决的？

pypi 版本是2.0.1（即当您使用 pip install 时）。该问题已在上修复2.0.3，需要一段时间才能传播到 pypi。要么尝试使用 github 版本中提供的 wheel，要么您可能需要等到2.0.3完成上传到 pypi

thank you! i have solved it.

yimgg commented 3 months ago

k_activation.py导入时文件名命名错误k_activations。但是，修复后我仍然遇到问题。该代码适用于 Mamba v1。

Traceback (most recent call last):
  File "test_mambav2.py", line 266, in <module>
    y = mamba(x.unsqueeze(0))
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/modules/mamba2.py", line 176, in forward
    out = mamba_split_conv1d_scan_combined(
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 908, in mamba_split_conv1d_scan_combined
    return MambaSplitConv1dScanCombinedFn.apply(zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states, seq_idx, dt_limit, return_final_states, activation, rmsnorm_weight, rmsnorm_eps, outproj_weight, outproj_bias, headdim, ngroups, norm_before_gate)
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 115, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 757, in forward
    causal_conv1d_cuda.causal_conv1d_fwd(rearrange(xBC, "b s d -> b d s"),
RuntimeError: causal_conv1d with channel last layout requires strides (x.stride(0) and x.stride(2)) to be multiples of 8

Have you solved this problem?

realwenlongwang commented 3 months ago

Change the d_model to 256 solved it. But it is slower than Mamba1 in this simple task.

YuHengsss commented 2 months ago

Hello, I encounter the following error:

triton.compiler.errors.CompilationError: at 31:56:    pid_h = tl.program_id(axis=2)
    dt_ptr += pid_b * stride_dt_batch + pid_c * chunk_size * stride_dt_seqlen
    dt_out_ptr += pid_b * stride_dt_out_batch + pid_c * stride_dt_out_chunk
    dA_cumsum_ptr += pid_b * stride_dA_cs_batch + pid_c * stride_dA_cs_chunk

    offs_h = pid_h * BLOCK_SIZE_H + tl.arange(0, BLOCK_SIZE_H)
    offs_c = tl.arange(0, BLOCK_SIZE_CHUNK)
    dt_ptrs = dt_ptr + (offs_h[:, None] * stride_dt_head + offs_c[None, :] * stride_dt_seqlen)
    A_ptrs = A_ptr + offs_h * stride_A_head
    dt_out_ptrs = dt_out_ptr + (offs_h[:, None] * stride_dt_out_head + offs_c[None, :] * stride_dt_out_csize)
    dA_cs_ptrs = dA_cumsum_ptr + (offs_h[:, None] * stride_dA_cs_head + offs_c[None, :] * stride_dA_cs_csize)
    chunk_size_limit = min(chunk_size, seqlen - pid_c * chunk_size)
                                                        ^
IncompatibleTypeErrorImpl('invalid operands of type pointer<int64> and triton.language.int32')

when calling:

File "/home/mamba_attn/mamba_ssm/ops/triton/ssd_chunk_state.py", line 582, in _chunk_cumsum_fwd
    _chunk_cumsum_fwd_kernel[grid_chunk_cs](

My triton version is 2.2.0

Same issue! May I ask have you solved it yet?

MzeroMiko commented 2 months ago

@tyshiwo1

I solved it with a little trick:

Assuming you are using the function mamba_chunk_scan_combined in file ssd_combined.py.

First, add this into the beginning of the file to replace the original implementation of shape in pytorch.

if True:
    class tTensor(torch.Tensor):
        @property
        def shape(self):
            shape = super().shape
            return tuple([int(s) for s in shape])
    to_ttensor = lambda *args: tuple([tTensor(x) for x in args]) if len(args) > 1 else tTensor(args[0])

Second, add to_ttensor() to everywhere that is related to the error. For example, in my cases (only using the function mamba_chunk_scan_combined), I need to add x, dt, A, B, C = to_ttensor(x, dt, A, B, C) before line return MambaChunkScanCombinedFn.apply(x, dt, A, B, C, chunk_size, D, z, dt_bias, initial_states, seq_idx, dt_softplus, dt_limit, return_final_states) in mamba_chunk_scan_combined; add dt = to_ttensor(dt) before line states = _chunk_state_fwd(B, x, dt, dA_cumsum, seq_idx=seq_idx, states_in_fp32=True) in _mamba_chunk_scan_combined_fwd; change states, final_states = _state_passing_fwd(rearrange(states, "... p n -> ... (p n)"),... to states, final_states = _state_passing_fwd(to_ttensor(rearrange(states, "... p n -> ... (p n)")), ... in _mamba_chunk_scan_combined_fwd.

And after done all these, the error disappears.

YuHengsss commented 2 months ago

to_ttensor = lambda *args: tuple([tTensor(x) for x in args]) if len(args) > 1 else tTensor(args[0])

That works for me.

Zhou-CyberSecurity-AI commented 2 months ago

How to solve this error: 'NoneType' object has no attribute 'causal_conv1d_fwd'

q1005878349 commented 1 month ago

How to solve this error: 'NoneType' object has no attribute 'causal_conv1d_fwd'

You can try building causal-conv1d package from source, using:

git clone https://github.com/Dao-AILab/causal-conv1d.git then cd to this folder and checkout to the branch that you want use. and run following code: CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install .

state-spaces / mamba

Error when trying to use Mamba2 #345