ridgerchu / TCJA

[TNNLS 2024] Implementation of "TCJA-SNN: Temporal-Channel Joint Attention for Spiking Neural Networks"
48 stars 6 forks source link

DVS128运行问题 #6

Closed stephencoding closed 1 year ago

stephencoding commented 1 year ago

作者您好,我在运行源码的时候,三个数据集都下载而且按照格式放好,cifar10dvs运行正常,我的环境是ubuntu18.04+Geforce GTX TITAN X +cuda 11.3 ,环境都安装正常,但是只要是在dvs128运行的时候就会报这些错误。已经困扰了好几天,请作者帮忙看下,并且更新一下源码,十分感谢! image

python src/dvs128.py -data_dir /home/lab/datasets/DVSGesture -out_dir runs/dvs128/ -opt Adam -device cuda:0 -lr_scheduler CosALR -T_max 1024 -T 20 -epochs 1024 -b 16 -lr 0.001 -amp -j 20 spikingjelly.clock_driven.spike_op: try to usetorch.utils.cpp_extension.load_inline` to load cudnn functions. If it is hanging, pleast try to delete torch_extensions cache directory. (In most cases, the directory is /home/lab/.cache/torch_extensions/py38_cu117/) spikingjelly.clock_driven.spike_op: Ninja is required to load C++ extensions Namespace(T=20, T_max=1024, amp=True, b=16, channels=128, data_dir='/home/lab/datasets/DVSGesture', device='cuda:0', epochs=1024, gamma=0.1, j=20, lr=0.001, lr_scheduler='CosALR', momentum=0.9, opt='Adam', out_dir='runs/dvs128/', resume=None, step_size=32) CextNet( (conv): Sequential( (0): SeqToANNContainer( (0): Conv2d(2, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (1): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) (2): SeqToANNContainer( (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (3): SeqToANNContainer( (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (4): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) (5): SeqToANNContainer( (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (6): SeqToANNContainer( (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (7): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) (8): SeqToANNContainer( (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (9): SeqToANNContainer( (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (10): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) (11): TCJA( (conv): Conv1d(20, 20, kernel_size=(4,), stride=(1,), padding=same, bias=False) (conv_c): Conv1d(128, 128, kernel_size=(4,), stride=(1,), padding=same, bias=False) (sigmoid): Sigmoid() ) (12): SeqToANNContainer( (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (13): SeqToANNContainer( (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (14): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) (15): TCJA( (conv): Conv1d(20, 20, kernel_size=(4,), stride=(1,), padding=same, bias=False) (conv_c): Conv1d(128, 128, kernel_size=(4,), stride=(1,), padding=same, bias=False) (sigmoid): Sigmoid() ) (16): SeqToANNContainer( (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) ) (fc): Sequential( (0): Flatten(start_dim=2, end_dim=-1) (1): MultiStepDropout(p=0.5) (2): SeqToANNContainer( (0): Linear(in_features=2048, out_features=512, bias=False) ) (3): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) (4): MultiStepDropout(p=0.5) (5): SeqToANNContainer( (0): Linear(in_features=512, out_features=110, bias=False) ) (6): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) ) (vote): VotingLayer( (voting): AvgPool1d(kernel_size=(10,), stride=(10,), padding=(0,)) ) ) The directory [/home/lab/datasets/DVSGesture/frames_number_20_split_by_number] already exists. The directory [/home/lab/datasets/DVSGesture/frames_number_20_split_by_number] already exists. /home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 20 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( Traceback (most recent call last): File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 685, in compile nvrtc.compileProgram(self.ptr, options) File "cupy_backends/cuda/libs/nvrtc.pyx", line 141, in cupy_backends.cuda.libs.nvrtc.compileProgram File "cupy_backends/cuda/libs/nvrtc.pyx", line 153, in cupy_backends.cuda.libs.nvrtc.compileProgram File "cupy_backends/cuda/libs/nvrtc.pyx", line 69, in cupy_backends.cuda.libs.nvrtc.check_status cupy_backends.cuda.libs.nvrtc.NVRTCError: NVRTC_ERROR_COMPILATION (6)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "src/dvs128.py", line 259, in main() File "src/dvs128.py", line 190, in main out_fr = net(frame) File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "src/dvs128.py", line 58, in forward out_spikes = self.fc(self.conv(x)) # shape = [T, N, 110] File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/lab/lkb/code/TCJA-main/src/spikingjelly/clock_driven/neuron.py", line 697, in forward spike_seq, self.v_seq = neuron_kernel.MultiStepLIFNodePTT.apply( File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/lab/lkb/code/TCJA-main/src/spikingjelly/clock_driven/neuron_kernel.py", line 750, in forward kernel( File "cupy/_core/raw.pyx", line 89, in cupy._core.raw.RawKernel.call File "cupy/_core/raw.pyx", line 96, in cupy._core.raw.RawKernel.kernel.get File "cupy/_core/raw.pyx", line 113, in cupy._core.raw.RawKernel._kernel File "cupy/_util.pyx", line 67, in cupy._util.memoize.decorator.ret File "cupy/_core/raw.pyx", line 547, in cupy._core.raw._get_raw_module File "cupy/_core/core.pyx", line 2081, in cupy._core.core.compile_with_cache File "cupy/_core/core.pyx", line 2141, in cupy._core.core.compile_with_cache File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 492, in _compile_module_with_cache return _compile_with_cache_cuda( File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 570, in _compile_with_cache_cuda ptx, mapping = compile_using_nvrtc( File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 321, in compile_using_nvrtc return _compile(source, options, cu_path, File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 305, in _compile compiled_obj, mapping = prog.compile(options, log_stream) File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 704, in compile raise CompileException(log, self.src, self.name, options, cupy.cuda.compiler.CompileException: /tmp/tmpbpm0s6ox/8f23474ac6350d9d5997adf2361836d699de70ae.cubin.cu(24): error: identifier "__hsub2" is undefined

/tmp/tmpbpm0s6ox/8f23474ac6350d9d5997adf2361836d699de70ae.cubin.cu(24): error: identifier "__hadd2" is undefined

/tmp/tmpbpm0s6ox/8f23474ac6350d9d5997adf2361836d699de70ae.cubin.cu(24): error: identifier "__hfma2" is undefined

/tmp/tmpbpm0s6ox/8f23474ac6350d9d5997adf2361836d699de70ae.cubin.cu(26): error: identifier "__hgeu2" is undefined

/tmp/tmpbpm0s6ox/8f23474ac6350d9d5997adf2361836d699de70ae.cubin.cu(27): error: identifier "__hmul2" is undefined

5 errors detected in the compilation of "/tmp/tmpbpm0s6ox/8f23474ac6350d9d5997adf2361836d699de70ae.cubin.cu".`

Duanyll commented 1 year ago

看起来是 Titan X 不支持半精度的一些指令导致的,可能需要更新的GPU

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: @.> Sent: Sunday, September 17, 2023 3:51 PM To: @.> Cc: @.***> Subject: [ridgerchu/TCJA] DVS128运行问题 (Issue #6)

作者您好,我在运行源码的时候,三个数据集都下载而且按照格式放好,cifar10dvs运行正常,我的环境是ubuntu18.04+Geforce GTX TITAN X +cuda 11.3 ,环境都安装正常,但是只要是在dvs128运行的时候就会报这些错误。已经困扰了好几天,请作者帮忙看下,并且更新一下源码,十分感谢! [image]https://user-images.githubusercontent.com/79632906/268489626-58f59e2d-b705-47cf-8634-4d99064445f2.png python src/dvs128.py -data_dir /home/lab/datasets/DVSGesture -out_dir runs/dvs128/ -opt Adam -device cuda:0 -lr_scheduler CosALR -T_max 1024 -T 20 -epochs 1024 -b 16 -lr 0.001 -amp -j 20 spikingjelly.clock_driven.spike_op: try to usetorch.utils.cpp_extension.load_inline` to load cudnn functions. If it is hanging, pleast try to delete torch_extensions cache directory. (In most cases, the directory is /home/lab/.cache/torch_extensions/py38_cu117/) spikingjelly.clock_driven.spike_op: Ninja is required to load C++ extensions Namespace(T=20, T_max=1024, amp=True, b=16, channels=128, data_dir='/home/lab/datasets/DVSGesture', device='cuda:0', epochs=1024, gamma=0.1, j=20, lr=0.001, lr_scheduler='CosALR', momentum=0.9, opt='Adam', out_dir='runs/dvs128/', resume=None, step_size=32) CextNet( (conv): Sequential( (0): SeqToANNContainer( (0): Conv2d(2, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (1): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) (2): SeqToANNContainer( (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (3): SeqToANNContainer( (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (4): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) (5): SeqToANNContainer( (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (6): SeqToANNContainer( (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (7): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) (8): SeqToANNContainer( (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (9): SeqToANNContainer( (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (10): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) (11): TCJA( (conv): Conv1d(20, 20, kernel_size=(4,), stride=(1,), padding=same, bias=False) (conv_c): Conv1d(128, 128, kernel_size=(4,), stride=(1,), padding=same, bias=False) (sigmoid): Sigmoid() ) (12): SeqToANNContainer( (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (13): SeqToANNContainer( (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (14): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) (15): TCJA( (conv): Conv1d(20, 20, kernel_size=(4,), stride=(1,), padding=same, bias=False) (conv_c): Conv1d(128, 128, kernel_size=(4,), stride=(1,), padding=same, bias=False) (sigmoid): Sigmoid() ) (16): SeqToANNContainer( (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) ) (fc): Sequential( (0): Flatten(start_dim=2, end_dim=-1) (1): MultiStepDropout(p=0.5) (2): SeqToANNContainer( (0): Linear(in_features=2048, out_features=512, bias=False) ) (3): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) (4): MultiStepDropout(p=0.5) (5): SeqToANNContainer( (0): Linear(in_features=512, out_features=110, bias=False) ) (6): MultiStepLIFNode( v_threshold=1.0, v_reset=0.0, v_rest=0.0, detach_reset=True, tau=2.0, backend=cupy (surrogate_function): ATan(alpha=2.0, spiking=True) ) ) (vote): VotingLayer( (voting): AvgPool1d(kernel_size=(10,), stride=(10,), padding=(0,)) ) ) The directory [/home/lab/datasets/DVSGesture/frames_number_20_split_by_number] already exists. The directory [/home/lab/datasets/DVSGesture/frames_number_20_split_by_number] already exists. /home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 20 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( Traceback (most recent call last): File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 685, in compile nvrtc.compileProgram(self.ptr, options) File "cupy_backends/cuda/libs/nvrtc.pyx", line 141, in cupy_backends.cuda.libs.nvrtc.compileProgram File "cupy_backends/cuda/libs/nvrtc.pyx", line 153, in cupy_backends.cuda.libs.nvrtc.compileProgram File "cupy_backends/cuda/libs/nvrtc.pyx", line 69, in cupy_backends.cuda.libs.nvrtc.check_status cupy_backends.cuda.libs.nvrtc.NVRTCError: NVRTC_ERROR_COMPILATION (6)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "src/dvs128.py", line 259, in main() File "src/dvs128.py", line 190, in main out_fr = net(frame) File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "src/dvs128.py", line 58, in forward out_spikes = self.fc(self.conv(x)) # shape = [T, N, 110] File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/lab/lkb/code/TCJA-main/src/spikingjelly/clock_driven/neuron.py", line 697, in forward spike_seq, self.v_seq = neuron_kernel.MultiStepLIFNodePTT.apply( File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/lab/lkb/code/TCJA-main/src/spikingjelly/clock_driven/neuron_kernel.py", line 750, in forward kernel( File "cupy/_core/raw.pyx", line 89, in cupy._core.raw.RawKernel.call File "cupy/_core/raw.pyx", line 96, in cupy._core.raw.RawKernel.kernel.get File "cupy/_core/raw.pyx", line 113, in cupy._core.raw.RawKernel._kernel File "cupy/_util.pyx", line 67, in cupy._util.memoize.decorator.ret File "cupy/_core/raw.pyx", line 547, in cupy._core.raw._get_raw_module File "cupy/_core/core.pyx", line 2081, in cupy._core.core.compile_with_cache File "cupy/_core/core.pyx", line 2141, in cupy._core.core.compile_with_cache File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 492, in _compile_module_with_cache return _compile_with_cache_cuda( File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 570, in _compile_with_cache_cuda ptx, mapping = compile_using_nvrtc( File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 321, in compile_using_nvrtc return _compile(source, options, cu_path, File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 305, in _compile compiled_obj, mapping = prog.compile(options, log_stream) File "/home/lab/anaconda3/envs/torch/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 704, in compile raise CompileException(log, self.src, self.name, options, cupy.cuda.compiler.CompileException: /tmp/tmpbpm0s6ox/8f23474ac6350d9d5997adf2361836d699de70ae.cubin.cu(24): error: identifier "__hsub2" is undefined

/tmp/tmpbpm0s6ox/8f23474ac6350d9d5997adf2361836d699de70ae.cubin.cu(24): error: identifier "__hadd2" is undefined

/tmp/tmpbpm0s6ox/8f23474ac6350d9d5997adf2361836d699de70ae.cubin.cu(24): error: identifier "__hfma2" is undefined

/tmp/tmpbpm0s6ox/8f23474ac6350d9d5997adf2361836d699de70ae.cubin.cu(26): error: identifier "__hgeu2" is undefined

/tmp/tmpbpm0s6ox/8f23474ac6350d9d5997adf2361836d699de70ae.cubin.cu(27): error: identifier "__hmul2" is undefined

5 errors detected in the compilation of "/tmp/tmpbpm0s6ox/8f23474ac6350d9d5997adf2361836d699de70ae.cubin.cu".`

― Reply to this email directly, view it on GitHubhttps://github.com/ridgerchu/TCJA/issues/6, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF5RE4K3WCO42HA4WLY2TZDX22TYTANCNFSM6AAAAAA43OHKOA. You are receiving this because you are subscribed to this thread.Message ID: @.***>