wjh892521292 / LKM-UNet

Large Kernel Vision Mamba UNet for Medical Image Segmentation
https://arxiv.org/abs/2403.07332
62 stars 3 forks source link

ImportError: undefined symbol: _ZN2at4_ops10zeros_like4callERKNS_6TensorEN3c108optionalINS5_10ScalarTypeEEENS6_INS5_6LayoutEEENS6_INS5_6DeviceEEENS6_IbEENS6_INS5_12MemoryFormatEEE #3

Closed YUjh0729 closed 1 day ago

YUjh0729 commented 1 month ago

我的环境: cuda11.8、python3.10、pytorch2.0.1、causal-conv1d=1.1.1、mamba-ssm=1.2.0.post1(都是按照要求安装) 使用训练命令nnUNetv2_train 801 2d all -tr nnUNetTrainerLMaUNet时出现以下错误: ` ############################ INFO: You are using the old nnU-Net default plans. We have updated our recommendations. Please consider using those instead! Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_presets.md ############################

Traceback (most recent call last): File "/home/yjh666/anaconda3/envs/LMa-Unet/bin/nnUNetv2_train", line 8, in sys.exit(run_training_entry()) File "/home/yjh666/anaconda3/envs/LMa-Unet/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 274, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/home/yjh666/anaconda3/envs/LMa-Unet/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 195, in run_training nnunet_trainer = get_trainer_from_args(dataset_name_or_id, configuration, fold, trainer_class_name, File "/home/yjh666/anaconda3/envs/LMa-Unet/lib/python3.10/site-packages/nnunetv2/run/run_training.py", line 39, in get_trainer_from_args nnunet_trainer = recursive_find_python_class(join(nnunetv2.path[0], "training", "nnUNetTrainer"), File "/home/yjh666/anaconda3/envs/LMa-Unet/lib/python3.10/site-packages/nnunetv2/utilities/find_class_by_name.py", line 12, in recursive_find_python_class m = importlib.import_module(current_module + "." + modname) File "/home/yjh666/anaconda3/envs/LMa-Unet/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/home/yjh666/anaconda3/envs/LMa-Unet/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainerLMaUNet.py", line 6, in from nnunetv2.nets.LMaUNet import get_lmaunet_from_plans File "/home/yjh666/anaconda3/envs/LMa-Unet/lib/python3.10/site-packages/nnunetv2/nets/LMaUNet.py", line 27, in from mamba_ssm import Mamba File "/home/yjh666/anaconda3/envs/LMa-Unet/lib/python3.10/site-packages/mamba_ssm/init.py", line 3, in from mamba_ssm.ops.selective_scan_interface import selective_scan_fn, mamba_inner_fn File "/home/yjh666/anaconda3/envs/LMa-Unet/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py", line 16, in import selective_scan_cuda ImportError: /home/yjh666/anaconda3/envs/LMa-Unet/lib/python3.10/site-packages/selective_scan_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops10zeros_like4callERKNS_6TensorEN3c108optionalINS5_10ScalarTypeEEENS6_INS5_6LayoutEEENS6_INS5_6DeviceEEENS6_IbEENS6_INS5_12MemoryFormatEEE ` 本人感觉是causal-conv1d=1.1.1、mamba-ssm=1.2.0.post1的版本冲突,之前在环境cuda11.8、python3.10、pytorch2.1.2、causal-conv1d=1.1.0、mamba-ssm=1.1.1解决了类似的报错。 请问有没有推荐的版本解决该问题?

wjh892521292 commented 1 month ago

好像现在1.1.1会报错,尝试 causal-conv1d=1.2.0.post2 看看

YUjh0729 commented 1 month ago

causal-conv1d=1.2.0.post2

试了 causal-conv1d=1.2.0.post2 +mamba-ssm=1.1.1出现的是import causal_conv1d_cuda报错 causal-conv1d=1.2.0.post2 +mamba-ssm=1.2.0.post1出现的是import selective_scan_cuda报错 都是上面类似的undefined symbol错误,应该还是两者的版本冲突

YUjh0729 commented 1 month ago

好像现在1.1.1会报错,尝试 causal-conv1d=1.2.0.post2 看看

请问作者运行时两者对应关系是怎么样的,可以交流一下吗

wjh892521292 commented 1 month ago

我试了 causal-conv1d=1.2.0.post2 +mamba-ssm=1.2.0.post1 可以运行。 cuda11.8 p+python3.10 + pytorch2.0.1 + torchvision0.15.2的环境

YUjh0729 commented 1 month ago

我试了 causal-conv1d=1.2.0.post2 +mamba-ssm=1.2.0.post1 可以运行。 cuda11.8 p+python3.10 + pytorch2.0.1 + torchvision0.15.2的环境

有效,我重新查看了所有依赖包,发现我的pytorch版本变成2.3.0,很奇怪。我重新下载pytorch2.0.1,两者包就没有产生冲突了。

YUjh0729 commented 1 month ago

请问您的训练命令是使用了nnunet内置的命令,但对于nnuntv2这个依赖包,在本项目文件中,nnuntv2文件夹内是有许多新增文件的,但通过命令pip insatll nnunetv2下载的nunetv2依赖是不含本项目新增的文件的,当时nnUNetv2_train DATASET_ID 2d all -tr nnUNetTrainerLMaUNet命令训练时,会出现许多包找不到的情况。我的解决方法是缺失什么文件,就将LMa-Unet项目中nnunetv2文件夹中的文件复制的虚拟环境anaconda3\envs\LMa-Unet\lib\python3.10\site-packages\nnunetv2\文件夹中去。请问作者是如何使用nnunetv2依赖包的。非常感谢!

wjh892521292 commented 1 month ago

理论上来说在目录下进行pip install -e . 就会自动包括所有依赖。 即根据readme里: cd LMa-UNet/lmaunet pip install -e .