Open yxchng opened 3 months ago
Thanks i've just fixed it
Will you provide the whl file of ver2.0.1?
it's compiling
Thank you very much!
tried pulling latest version but still missing imports
Traceback (most recent call last):
File "test_mambav2.py", line 6, in <module>
from mamba_ssm import Mamba
File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/__init__.py", line 5, in <module>
from mamba_ssm.modules.mamba2 import Mamba2
File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/modules/mamba2.py", line 26, in <module>
from mamba_ssm.ops.triton.ssd_combined import mamba_chunk_scan_combined
File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 40, in <module>
from mamba_ssm.ops.triton.k_activations import _swiglu_fwd, _swiglu_bwd
ModuleNotFoundError: No module named 'mamba_ssm.ops.triton.k_activations'
thanks i've updated
Your filename is wrongly named as k_activation.py
when it is importing k_activations
. However, I am still facing issue after fixing that. The code works with Mamba v1.
Traceback (most recent call last):
File "test_mambav2.py", line 266, in <module>
y = mamba(x.unsqueeze(0))
File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/modules/mamba2.py", line 176, in forward
out = mamba_split_conv1d_scan_combined(
File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 908, in mamba_split_conv1d_scan_combined
return MambaSplitConv1dScanCombinedFn.apply(zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states, seq_idx, dt_limit, return_final_states, activation, rmsnorm_weight, rmsnorm_eps, outproj_weight, outproj_bias, headdim, ngroups, norm_before_gate)
File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 115, in decorate_fwd
return fwd(*args, **kwargs)
File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 757, in forward
causal_conv1d_cuda.causal_conv1d_fwd(rearrange(xBC, "b s d -> b d s"),
RuntimeError: causal_conv1d with channel last layout requires strides (x.stride(0) and x.stride(2)) to be multiples of 8
Try making the dimension larger (e.g. multiple of 512).
must the dimension be large? my network channels are just 32,64,128,256 and 512.
Idk i haven't tested anything not multiple of 512. You can also not use the conv1d package (uninstalling it should mean the mamba code uses torch nn.Conv1d).
I believe this could be an error stemming form the same issue?
38 from mamba_ssm.ops.triton.ssd_chunk_scan import _chunk_scan_bwd_ddAcs_prev 39 from mamba_ssm.ops.triton.layernorm_gated import rmsnorm_fn, _layer_norm_fwd, _layer_norm_bwd ---> 40 from mamba_ssm.ops.triton.k_activations import _swiglu_fwd, _swiglu_bwd 42 TRITON_22 = version.parse(triton.version) >= version.parse('2.2.0') 45 def init_to_zero(names):
ModuleNotFoundError: No module named 'mamba_ssm.ops.triton.k_activations'
This is what happens when I try to import Mamba2.
For some reason, the ops/triton/k_activations.py file is not present in the pip release, or it could be an issue on my end. Pasting it in manually solved the issue
Idk i haven't tested anything not multiple of 512. You can also not use the conv1d package (uninstalling it should mean the mamba code uses torch nn.Conv1d).
Hi @tridao ,
Thanks for your swift reply. I got the following error without using the conv1d package (can be solved by installing the conv1d package ). Any comments are highly appreciated.
Traceback (most recent call last):
File "/home/jma/Documents/mamba/debug_mamba2.py", line 12, in <module>
y = model(x)
File "/home/jma/anaconda3/envs/mamba2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/jma/anaconda3/envs/mamba2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jma/Documents/mamba/mamba_ssm/modules/mamba2.py", line 176, in forward
out = mamba_split_conv1d_scan_combined(
File "/home/jma/Documents/mamba/mamba_ssm/ops/triton/ssd_combined.py", line 908, in mamba_split_conv1d_scan_combined
return MambaSplitConv1dScanCombinedFn.apply(zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states, seq_idx, dt_limit, return_final_states, activation, rmsnorm_weight, rmsnorm_eps, outproj_weight, outproj_bias, headdim, ngroups, norm_before_gate)
File "/home/jma/anaconda3/envs/mamba2/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/jma/anaconda3/envs/mamba2/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 115, in decorate_fwd
return fwd(*args, **kwargs)
File "/home/jma/Documents/mamba/mamba_ssm/ops/triton/ssd_combined.py", line 757, in forward
causal_conv1d_cuda.causal_conv1d_fwd(rearrange(xBC, "b s d -> b d s"),
AttributeError: 'NoneType' object has no attribute 'causal_conv1d_fwd'
Idk i haven't tested anything not multiple of 512. You can also not use the conv1d package (uninstalling it should mean the mamba code uses torch nn.Conv1d).
I've tried with dim=1024 (a multiple of 512) and got the following error:
File "/home/ubuntu/miniconda3/envs/venv/lib/python3.10/site-packages/mamba_ssm/modules/mamba2.py", line 176, in forward out = mamba_split_conv1d_scan_combined( File "/home/ubuntu/miniconda3/envs/venv/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 908, in mamba_split_conv1d_scan_combined return MambaSplitConv1dScanCombinedFn.apply(zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states, seq_idx, dt_limit, return_final_states, activation, rmsnorm_weight, rmsnorm_eps, outproj_weight, outproj_bias, headdim, ngroups, norm_before_gate) File "/home/ubuntu/miniconda3/envs/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/ubuntu/miniconda3/envs/venv/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 113, in decorate_fwd return fwd(*args, **kwargs) File "/home/ubuntu/miniconda3/envs/venv/lib/python3.10/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 757, in forward causal_conv1d_cuda.causal_conv1d_fwd(rearrange(xBC, "b s d -> b d s"), RuntimeError: causal_conv1d only supports channel dimension divisible by 8 for now
Hi @hanif-rt , the following script works:
from mamba_ssm import Mamba2
# from mamba_ssm.modules.mamba2_simple import Mamba2Simple as Mamba2
import torch
batch, length, dim = 2, 64, 1024
x = torch.randn(batch, length, dim).to("cuda")
model = Mamba2(
# This module uses roughly 3 * expand * d_model^2 parameters
d_model=dim, # Model dimension d_model
d_state=64, # SSM state expansion factor, typically 64 or 128
d_conv=4, # Local convolution width
expand=2, # Block expansion factor
headdim=128
).to("cuda")
y = model(x)
assert y.shape == x.shape
print("Mamba2 model parameters:", sum(p.numel() for p in model.parameters() if p.requires_grad))
print('x.shape:', x.shape, 'y.shape:', y.shape)
我认为这可能是由同一问题引起的错误?
38从 mamba_ssm.ops.triton.ssd_chunk_scan 导入 _chunk_scan_bwd_ddAcs_prev 39从 mamba_ssm.ops.triton.layernorm_gated 导入 rmsnorm_fn、_layer_norm_fwd、_layer_norm_bwd ---> 40从 mamba_ssm.ops.triton.k_activations 导入 _swiglu_fwd、_swiglu_bwd 42 TRITON_22 = version.parse(triton. version ) >= version.parse('2.2.0') 45 def init_to_zero(names):
ModuleNotFoundError:没有名为“mamba_ssm.ops.triton.k_activations”的模块
当我尝试导入 Mamba2 时发生了这种情况。
由于某种原因,pip 版本中不存在 ops/triton/k_activations.py 文件,或者这可能是我的问题。手动粘贴解决了这个问题
how did you solve it?please!
我认为这可能是由同一问题引起的错误?
38从 mamba_ssm.ops.triton.ssd_chunk_scan 导入 _chunk_scan_bwd_ddAcs_prev 39从 mamba_ssm.ops.triton.layernorm_gated 导入 rmsnorm_fn、_layer_norm_fwd、_layer_norm_bwd ---> 40从 mamba_ssm.ops.triton.k_activations 导入 _swiglu_fwd、_swiglu_bwd 42 TRITON_22 = version.parse(triton. version ) >= version.parse('2.2.0') 45 def init_to_zero(names):
ModuleNotFoundError:没有名为“mamba_ssm.ops.triton.k_activations”的模块 当我尝试导入 Mamba2 时发生了这种情况。 由于某种原因,pip 版本中不存在 ops/triton/k_activations.py 文件,或者这可能是我的问题。手动粘贴解决了这个问题
how did you solve it?please!
pypi version is 2.0.1
(i.e when you pip install). The problem was fixed on 2.0.3
, which will take a while to propagate to pypi. Either try it with the wheel, given in the github release, or you might have to wait until 2.0.3
finish uploading to pypi
Hello, I encounter the following error:
triton.compiler.errors.CompilationError: at 31:56: pid_h = tl.program_id(axis=2)
dt_ptr += pid_b * stride_dt_batch + pid_c * chunk_size * stride_dt_seqlen
dt_out_ptr += pid_b * stride_dt_out_batch + pid_c * stride_dt_out_chunk
dA_cumsum_ptr += pid_b * stride_dA_cs_batch + pid_c * stride_dA_cs_chunk
offs_h = pid_h * BLOCK_SIZE_H + tl.arange(0, BLOCK_SIZE_H)
offs_c = tl.arange(0, BLOCK_SIZE_CHUNK)
dt_ptrs = dt_ptr + (offs_h[:, None] * stride_dt_head + offs_c[None, :] * stride_dt_seqlen)
A_ptrs = A_ptr + offs_h * stride_A_head
dt_out_ptrs = dt_out_ptr + (offs_h[:, None] * stride_dt_out_head + offs_c[None, :] * stride_dt_out_csize)
dA_cs_ptrs = dA_cumsum_ptr + (offs_h[:, None] * stride_dA_cs_head + offs_c[None, :] * stride_dA_cs_csize)
chunk_size_limit = min(chunk_size, seqlen - pid_c * chunk_size)
^
IncompatibleTypeErrorImpl('invalid operands of type pointer<int64> and triton.language.int32')
when calling:
File "/home/mamba_attn/mamba_ssm/ops/triton/ssd_chunk_state.py", line 582, in _chunk_cumsum_fwd
_chunk_cumsum_fwd_kernel[grid_chunk_cs](
My triton version is 2.2.0
我认为这可能是由于同一问题引起的错误?
38从 mamba_ssm.ops.triton.ssd_chunk_scan 导入 _chunk_scan_bwd_ddAcs_prev 39从 mamba_ssm.ops.triton.layernorm_gated 导入 rmsnorm_fn、_layer_norm_fwd、_layer_norm_bwd ---> 40从 mamba_ssm.ops.triton.k_activations 导入 _swiglu_fwd、_swiglu_bwd 42 TRITON_22 = version.parse(triton. version ) >= version.parse('2.2.0') 45 def init_to_zero(names):
ModuleNotFoundError:没有名为“mamba_ssm.ops.triton.k_activations”的模块 当我尝试导入 Mamba2 时发生了这种情况。 由于某种原因,pip 版本中不存在 ops/triton/k_activations.py 文件,或者这可能是我的一个问题。手动粘贴解决了这个问题
请问你是怎么解决的?
pypi 版本是
2.0.1
(即当您使用 pip install 时)。该问题已在 上修复2.0.3
,需要一段时间才能传播到 pypi。要么尝试使用 github 版本中提供的 wheel,要么您可能需要等到2.0.3
完成上传到 pypi
thank you! i have solved it.
k_activation.py
导入时文件名命名错误k_activations
。但是,修复后我仍然遇到问题。该代码适用于 Mamba v1。Traceback (most recent call last): File "test_mambav2.py", line 266, in <module> y = mamba(x.unsqueeze(0)) File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/modules/mamba2.py", line 176, in forward out = mamba_split_conv1d_scan_combined( File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 908, in mamba_split_conv1d_scan_combined return MambaSplitConv1dScanCombinedFn.apply(zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states, seq_idx, dt_limit, return_final_states, activation, rmsnorm_weight, rmsnorm_eps, outproj_weight, outproj_bias, headdim, ngroups, norm_before_gate) File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/autograd/function.py", line 553, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 115, in decorate_fwd return fwd(*args, **kwargs) File "/home/test/miniconda3/envs/mamba/lib/python3.8/site-packages/mamba_ssm/ops/triton/ssd_combined.py", line 757, in forward causal_conv1d_cuda.causal_conv1d_fwd(rearrange(xBC, "b s d -> b d s"), RuntimeError: causal_conv1d with channel last layout requires strides (x.stride(0) and x.stride(2)) to be multiples of 8
Have you solved this problem?
Change the d_model to 256 solved it. But it is slower than Mamba1 in this simple task.
Hello, I encounter the following error:
triton.compiler.errors.CompilationError: at 31:56: pid_h = tl.program_id(axis=2) dt_ptr += pid_b * stride_dt_batch + pid_c * chunk_size * stride_dt_seqlen dt_out_ptr += pid_b * stride_dt_out_batch + pid_c * stride_dt_out_chunk dA_cumsum_ptr += pid_b * stride_dA_cs_batch + pid_c * stride_dA_cs_chunk offs_h = pid_h * BLOCK_SIZE_H + tl.arange(0, BLOCK_SIZE_H) offs_c = tl.arange(0, BLOCK_SIZE_CHUNK) dt_ptrs = dt_ptr + (offs_h[:, None] * stride_dt_head + offs_c[None, :] * stride_dt_seqlen) A_ptrs = A_ptr + offs_h * stride_A_head dt_out_ptrs = dt_out_ptr + (offs_h[:, None] * stride_dt_out_head + offs_c[None, :] * stride_dt_out_csize) dA_cs_ptrs = dA_cumsum_ptr + (offs_h[:, None] * stride_dA_cs_head + offs_c[None, :] * stride_dA_cs_csize) chunk_size_limit = min(chunk_size, seqlen - pid_c * chunk_size) ^ IncompatibleTypeErrorImpl('invalid operands of type pointer<int64> and triton.language.int32')
when calling:
File "/home/mamba_attn/mamba_ssm/ops/triton/ssd_chunk_state.py", line 582, in _chunk_cumsum_fwd _chunk_cumsum_fwd_kernel[grid_chunk_cs](
My triton version is 2.2.0
Same issue! May I ask have you solved it yet?
@tyshiwo1
I solved it with a little trick:
Assuming you are using the function mamba_chunk_scan_combined
in file ssd_combined.py
.
First, add this into the beginning of the file to replace the original implementation of shape in pytorch.
if True:
class tTensor(torch.Tensor):
@property
def shape(self):
shape = super().shape
return tuple([int(s) for s in shape])
to_ttensor = lambda *args: tuple([tTensor(x) for x in args]) if len(args) > 1 else tTensor(args[0])
Second, add to_ttensor()
to everywhere that is related to the error. For example, in my cases (only using the function mamba_chunk_scan_combined
), I need to add x, dt, A, B, C = to_ttensor(x, dt, A, B, C)
before line return MambaChunkScanCombinedFn.apply(x, dt, A, B, C, chunk_size, D, z, dt_bias, initial_states, seq_idx, dt_softplus, dt_limit, return_final_states)
in mamba_chunk_scan_combined
; add dt = to_ttensor(dt)
before line states = _chunk_state_fwd(B, x, dt, dA_cumsum, seq_idx=seq_idx, states_in_fp32=True)
in _mamba_chunk_scan_combined_fwd
; change states, final_states = _state_passing_fwd(rearrange(states, "... p n -> ... (p n)"),...
to states, final_states = _state_passing_fwd(to_ttensor(rearrange(states, "... p n -> ... (p n)")), ...
in _mamba_chunk_scan_combined_fwd
.
And after done all these, the error disappears.
to_ttensor = lambda *args: tuple([tTensor(x) for x in args]) if len(args) > 1 else tTensor(args[0])
That works for me.
How to solve this error: 'NoneType' object has no attribute 'causal_conv1d_fwd'
How to solve this error: 'NoneType' object has no attribute 'causal_conv1d_fwd'
You can try building causal-conv1d package from source, using:
git clone https://github.com/Dao-AILab/causal-conv1d.git
then cd to this folder and checkout to the branch that you want use.
and run following code:
CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install .