Open EddieEduardo opened 1 month ago
1. pytorch2.1+cuda11.8+causal-conv1d1.1.0+mamba-ssm2.2.2:
Traceback (most recent call last):
File "/home/heoy/Documents/Eduardo/lmm/LocalMamba-main/classification/tools/train.py", line 416, in
2. pytorch2.1+cuda11.8+causal-conv1d1.1.0+mamba-ssm2.0.0:
Traceback (most recent call last):
File "/home/heoy/Documents/Eduardo/lmm/LocalMamba-main/classification/tools/train.py", line 416, in
3. pytorch2.1+cuda11.8+causal-conv1d1.1.0+mamba-ssm1.1.1:
Traceback (most recent call last):
File "/home/heoy/Documents/Eduardo/lmm/LocalMamba-main/classification/tools/train.py", line 416, in
**3. pytorch2.1+cuda11.8+causal-conv1d1.1.0+mamba-ssm1.1.0:也是没有mamba_inner_fn_no_out_proj这个函数
Invoked with: tensor([[[ 0.2862, -0.2344, 0.3353, -0.0160], [-0.1135, -0.3617, 0.0763, -0.0326], [-0.4140, -0.1433, -0.0519, -0.0714], ..., [-0.0384, -0.3736, -0.3572, -0.6613], [-0.1745, 0.0563, 0.2299, 0.0616], [ 0.4169, 0.5416, 0.1388, -0.0973]],
Finally, all issues are fixed!!! My environment: Pytorch2.1.2+Cuda11.8+Python3.9+Causal-conv1d1.4.0+Mamba-ssm1.1.0
_cd causualconv1d && pip install . cd .. cd mamba-1p1p1 && pip install . But all failed with following issues:
_subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "....../site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]_
Then I manually installed these 2 packages from https://github.com/Dao-AILab/causal-conv1d/releases and https://github.com/state-spaces/mamba/releases?page=1, actually I have tried many versions of these two packages, finally I installed the following 2 packs as these two cause a fewer issues in localmamba scripts:
causal_conv1d-1.4.0+cu118torch2.1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
mamba_ssm-1.1.0+cu118torch2.1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
Pay attention to cuda, torch and python versions that should be the same as installed. Additionally, pay attention to use wheel names with FALSE in it as all wheels with TRUE version wheels aslo raise something like 'undefined symbol: _ZN3c104cuda20CUDACachingAllocator12recordStreamERKNS_7DataPtrENS0_10CUDAStreamE' errors.
In this case, in ./classification/lib/models/mamba, mamba_inner_fn_no_out_proj func can be imported and should not be None :
try:
from mamba_ssm.ops.selective_scan_interface import mamba_inner_fn_no_out_proj
except ImportError:
mamba_inner_fn_no_out_proj = None
Modify:
Line177: conv1d_out = causal_conv1d_cuda.causal_conv1d_fwd(x, conv1d_weight, conv1d_bias, None, None, None, True)
Line239: conv1d_out = causal_conv1d_cuda.causal_conv1d_fwd(x, conv1d_weight, conv1d_bias,None, True)
Line281: dx, dconv1d_weight, dconv1d_bias = causal_conv1d_cuda.causal_conv1d_bwd(
x, conv1d_weight, conv1d_bias, dconv1d_out, None, dx, True
)
to:
Line177: conv1d_out = causal_conv1d_cuda.causal_conv1d_fwd(x, conv1d_weight, conv1d_bias, None, None, None, True)
Line239: conv1d_out = causal_conv1d_cuda.causal_conv1d_fwd(x, conv1d_weight, conv1d_bias, None, None, None, True)
Line281: dx, dconv1d_weight, dconv1d_bias = causal_conv1d_cuda.causal_conv1d_bwd(
x, conv1d_weight, conv1d_bias, dconv1d_out, None, None, None, dx, False, True
)[:3]
Because the causal_conv1d_cuda.causal_conv1d_fwd and causal_conv1d_cuda.causal_conv1d_bwd installed in step 1 is incompatible with selective_scan_interface.py, I further checked the source codes of causal_conv1d_cuda.causal_conv1d_fwd and bwd from https://github.com/Dao-AILab/causal-conv1d/blob/main/csrc/causal_conv1d.cpp, so the number of parameters of fwd and bwd is 7 and 10, and the bwd function returns a total of 4 parameters, with the last one being dinitial_states (which should be omitted???). The actual modifications should align with the number of parameters indicated by the terminal errors, but they should be quite similar overall.
causal_conv1d_fwd(const at::Tensor &x, const at::Tensor &weight,
const c10::optional<at::Tensor> &bias_,
const c10::optional<at::Tensor> &seq_idx_,
const c10::optional<at::Tensor> &initial_states_,
c10::optional<at::Tensor> &final_states_out_,
bool silu_activation)
...
causal_conv1d_bwd(const at::Tensor &x, const at::Tensor &weight,
const c10::optional<at::Tensor> &bias_,
at::Tensor &dout,
const c10::optional<at::Tensor> &seq_idx_,
const c10::optional<at::Tensor> &initial_states_,
const c10::optional<at::Tensor> &dfinal_states_,
c10::optional<at::Tensor> &dx_,
bool return_dinitial_states,
bool silu_activation)
...
return {dx, dweight.to(weight.dtype()), bias_.has_value() ? dbias.to(bias_.value().dtype()) : dbias, dinitial_states}
Steps 2 and 3 are actually some adjustments to align localmamba with mamba-ssm and causal-conv1d. If there's a mamba-ssm library that can directly include the mamba_inner_fn_no_out_proj function and is compatible with causal-conv1d, please let me know (I have tried mamba-ssm with versions of 1.0.1, 1.1.0, 1.1.1, 1.2.0, 2.0.0, and 2.2.2, and causal-conv1d with versions of 1.0.0, 1.1.0, 1.1.1, 1.1.3 and 1.4.0).
Thank you!
Thank you for sharing the code!
Could you please let me know which versions of Triton, Torch, Casual-conv1d, and Mamba-ssm you are using? I encountered some wired issues with mamba and casual-conv1d, and Triton.
Thank you !