state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
13.35k stars 1.13k forks source link

assert self.d_ssm % self.headdim == 0 #360

Open songxujay opened 5 months ago

songxujay commented 5 months ago

Has anyone met this problem before? Thank you!

''' import torch

from mamba_ssm.modules import Mamba2

batch, length, dim = 2, 64, 16 x = torch.randn(batch, length, dim).to("cuda")

model = Mamba2(

This module uses roughly 3 expand d_model^2 parameters

d_model=dim, # Model dimension d_model
d_state=64,  # SSM state expansion factor, typically 64 or 128
d_conv=4,    # Local convolution width
expand=2,    # Block expansion factor

).to("cuda") y = model(x) assert y.shape == x.shape ''' ''' Traceback (most recent call last): File "/home/songxu/fujitsu/mamba/mamba/test_mamba2.py", line 8, in model = Mamba2( File "/home/songxu/fujitsu/mamba/mamba/mamba_ssm/modules/mamba2.py", line 77, in init assert self.d_ssm % self.headdim == 0 AssertionError '''

Hprairie commented 5 months ago

The size of headdim defaults to 64, which means that there needs to be at least 64 dimensions to create a single head. Just decrease the amount of dims in a head by passing headdim or increase the dimensionality of your model.

songxujay commented 5 months ago

Thanks all, but after these, I meet new issue ''' causal_conv1d_cuda.causal_conv1d_fwd(rearrange(xBC, "b s d -> b d s"), TypeError: causal_conv1d_fwd(): incompatible function arguments. The following argument types are supported:

  1. (arg0: at::Tensor, arg1: at::Tensor, arg2: Optional[at::Tensor], arg3: bool) -> at::Tensor ''' How to solve causal_conv1d_fwd()?
Hprairie commented 5 months ago

Look at #257

LinjieFu-U commented 5 months ago

Have you solved this problem? It doesn't happen when I use mamba, but when I use mamba2, TypeError: causal_conv1d_fwd(): incompatible function arguments. The following argument types are supported:

  1. (arg0: at::Tensor, arg1: at::Tensor, arg2: Optional[at::Tensor], arg3: bool) -> at::Tensor, my causal_conv1d version is 1.0.0, mamba-ssm version is 1.0.1al_conv1d version is 1.0.0, mamba-ssm version is 1.0.1.
yzeng58 commented 5 months ago

After decrease the headim, I also encounter another issue. Here is the error message I received:

File /data/yzeng58/anaconda3/envs/mamba2/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py:761, in MambaSplitConv1dScanCombinedFn.forward(ctx, zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states, seq_idx, dt_limit, return_final_states, activation, rmsnorm_weight, rmsnorm_eps, outproj_weight, outproj_bias, headdim, ngroups, norm_before_gate)
    [758](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a224f4c56492d31227d.vscode-resource.vscode-cdn.net/data/yzeng58/anaconda3/envs/mamba2/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py:758) zx0, z, xBC, dt = torch.split(zxbcdt, [2 * d_nonssm, dim, dim + ngroups * dstate * 2, nheads], dim=-1)
    [759](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a224f4c56492d31227d.vscode-resource.vscode-cdn.net/data/yzeng58/anaconda3/envs/mamba2/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py:759) seq_idx = seq_idx.contiguous() if seq_idx is not None else None
    [760](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a224f4c56492d31227d.vscode-resource.vscode-cdn.net/data/yzeng58/anaconda3/envs/mamba2/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py:760) xBC_conv = rearrange(
--> [761](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a224f4c56492d31227d.vscode-resource.vscode-cdn.net/data/yzeng58/anaconda3/envs/mamba2/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py:761)     causal_conv1d_cuda.causal_conv1d_fwd(rearrange(xBC, "b s d -> b d s"),
    [762](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a224f4c56492d31227d.vscode-resource.vscode-cdn.net/data/yzeng58/anaconda3/envs/mamba2/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py:762)                                          conv1d_weight, conv1d_bias, seq_idx, None, None, activation in ["silu", "swish"]),
    [763](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a224f4c56492d31227d.vscode-resource.vscode-cdn.net/data/yzeng58/anaconda3/envs/mamba2/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py:763)     "b d s -> b s d"
    [764](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a224f4c56492d31227d.vscode-resource.vscode-cdn.net/data/yzeng58/anaconda3/envs/mamba2/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py:764) )
    [765](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a224f4c56492d31227d.vscode-resource.vscode-cdn.net/data/yzeng58/anaconda3/envs/mamba2/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py:765) x, B, C = torch.split(xBC_conv, [dim, ngroups * dstate, ngroups * dstate], dim=-1)
    [766](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a224f4c56492d31227d.vscode-resource.vscode-cdn.net/data/yzeng58/anaconda3/envs/mamba2/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py:766) x = rearrange(x, "b l (h p) -> b l h p", h=nheads)

AttributeError: 'NoneType' object has no attribute 'causal_conv1d_fwd'
Hprairie commented 5 months ago

I would try uninstalling everything and then rebuilding mamba, make sure that your gpu's compute capability is added to setup.py. Take a look at #257 .

ZhijingS commented 2 months ago

Have you solved this problem? It doesn't happen when I use mamba, but when I use mamba2, TypeError: causal_conv1d_fwd(): incompatible function arguments. The following argument types are supported:

  1. (arg0: at::Tensor, arg1: at::Tensor, arg2: Optional[at::Tensor], arg3: bool) -> at::Tensor, my causal_conv1d version is 1.0.0, mamba-ssm version is 1.0.1al_conv1d version is 1.0.0, mamba-ssm version is 1.0.1.

Hi, I meet the same error when use mamba2. Have you solve the question?