Closed pangyanhua closed 11 months ago
1) What is the error during the installation of "cupy"? 2) please check your torch / cuda version & spikingjelly version? 3) try the "torch" backend in replacement of "cupy", which may cause the training to be slower.
我也遇到了上面的问题,无论怎样安装CuPy,无论是pip还是conda,运行代码时都会提示“CuPy is not installed! You can install it from "https://github.com/cupy/cupy.” 但我单独新建问题进行import cupy进行测试是没有问题的,请问该怎么解决?
You can refer to a similar issue in SpikingJelly which may help to solve this problem. https://github.com/fangwei123456/spikingjelly/issues/243
Thanks. I have already referred to the web page you provided.
"conda list torch" and "conda list cupy" are fine, and "import cupy" is fine, but the code still says 'CuPy is not installed! You can install it from "https://github.com/cupy/cupy".'
I'm now using "torch" as the backbend, which is much slower, and it took me nearly 12 hours to run 70 Epochs, My code set 400 epochs. I think my computer will probably be tired to death (sad (god bless my computer and me
可能是没有下载tensorboard,见 https://github.com/fangwei123456/spikingjelly/blob/0.0.0.0.12/spikingjelly/clock_driven/cu_kernel_opt.py#L9C1-L10C1、 在0.0.0.0.12版本下import cupy的同时需要import tensorboard,0.0.0.0.14似乎修复了这个问题
我也遇到了这个问题,请问有解决方法么
装最新版框架试试?@Castrol68
装最新版框架试试?@Castrol68
Hi, I've got a question about the CuPy acceleration impact on Spikingformer. When using the torch backend, each iter takes ~0.71 secs, while with CuPy, it takes 0.57 secs, resulting in ~20% reduction in training time.
I wonder whether this acceleration ratio is normal, considering that the acceleration impact of CuPy in the tutorial is extremely significant.
@touristourist Hi, it depends on T, the number of time-steps. You can try different T. If you use a small T, then the acceleration ratio is also small.
@touristourist Hi, it depends on T, the number of time-steps. You can try different T. If you use a small T, then the acceleration ratio is also small.
Alright, got it! So, I understand that different timesteps do have an impact on the acceleration ratio. However, the chart in the tutorial shows that when the timestep is 8, the forward and backward processes can speed up by approximately 5 times (8.13/1.65), which is quite a significant difference compared to what I'm experiencing (only a 20% time reduction). Could it be that the chart specifically shows the acceleration effect only on neurons, while operations like convolution and linear cannot be accelerated with CuPy? As a newcomer to spikingjelly, I'm eagerly awaiting your response. Thanks!
Could it be that the chart specifically shows the acceleration effect only on neurons, while operations like convolution and linear cannot be accelerated with CuPy.
Yes, and the acceleration ratio is smaller than a single neuron layer.
我也遇到了上面的问题,无论怎样安装CuPy,无论是pip还是conda,运行代码时都会提示“CuPy is not installed! You can install it from "https://github.com/cupy/cupy.” 但我单独新建问题进行import cupy进行测试是没有问题的,请问该怎么解决?
你好,我也遇到了同样的问题,请问这个问题你解决了没有
试试jkhu29的评论中的解决方法?
Traceback (most recent call last):
File "D:\Postgraduate\code\Spikingformer-master\imagenet\model.py", line 261, in
不用cupy的话就把神经元的后端设置成torch
不用cupy的话就把神经元的后端设置成torch
你好,请问是把这些都改为torch吗? self.mlp1_lif = MultiStepLIFNode(tau=2.0, detach_reset=True, backend='cupy')就是类似于这种的改一下吗?还是怎么改?
是的,所有神经元设置 backend='torch'
是的,所有神经元设置 backend='torch'
你好,刚刚修改完是可以运行了。但是我在跑test.py时,遇到了这样的问题,请问这个你知道怎么解决吗?
INFO:train:Training with a single process on 1 GPUs.
Training with a single process on 1 GPUs.
Creating model
number of params: 29705768
INFO:train:Model vitsnn created, param count:29705768
Model vitsnn created, param count:29705768
INFO:timm.data.config:Data processing configuration for current model + dataset:
Data processing configuration for current model + dataset:
INFO:timm.data.config: input_size: (3, 224, 224)
input_size: (3, 224, 224)
INFO:timm.data.config: interpolation: bicubic
interpolation: bicubic
INFO:timm.data.config: mean: (0.485, 0.456, 0.406)
mean: (0.485, 0.456, 0.406)
INFO:timm.data.config: std: (0.229, 0.224, 0.225)
std: (0.229, 0.224, 0.225)
INFO:timm.data.config: crop_pct: 1.0
crop_pct: 1.0
INFO:train:Using native Torch AMP. Training in mixed precision.
Using native Torch AMP. Training in mixed precision.
ERROR:timm.models.helpers:No checkpoint found at '/media/data/spike-transformer-network/spikingformer_github/imagenet/output/train/Spikingformer_models/checkpoint-284.pth.tar'
ERROR: No checkpoint found at '/media/data/spike-transformer-network/spikingformer_github/imagenet/output/train/Spikingformer_models/checkpoint-284.pth.tar'
Traceback (most recent call last):
File "D:\code\Spikingformer-master\imagenet\test.py", line 639, in
加载之前保存的权重失败,文件没找到。
我在调包计算FLOPs以及把模型转换成ONNX格式的过程中都遇到了以下报错,
AssertionError Traceback (most recent call last)
/tmp/ipykernel_811/993365423.py in
~/miniconda3/lib/python3.8/site-packages/calflops/flops_counter.py in calculate_flops(model, input_shape, transformer_tokenizer, args, kwargs, forward_mode, include_backPropagation, compute_bp_factor, print_results, print_detailed, output_as_string, output_precision, output_unit, ignore_modules) 163 164 if forwardmode == 'forward': --> 165 = model(args) 166 if forwardmode == 'generate': 167 = model.generate(args)
~/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, *kwargs) 1118 input = bw_hook.setup_input_hook(input) 1119 -> 1120 result = forward_call(input, *kwargs) 1121 if _global_forward_hooks or self._forward_hooks: 1122 for hook in (_global_forward_hooks.values(), *self._forward_hooks.values()):
~/autodl-fs/20231101_spikformer_cifar10/work/model.py in forward(self, x) 231 def forward(self, x): 232 x = (x.unsqueeze(0)).repeat(self.T, 1, 1, 1, 1) --> 233 x = self.forward_features(x) 234 x = self.head(x.mean(0)) 235 return x
~/autodl-fs/20231101_spikformer_cifar10/work/model.py in forward_features(self, x) 224 patch_embed = getattr(self, f"patch_embed") 225 --> 226 x = patch_embed(x) 227 for blk in block: 228 x = blk(x)
~/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, *kwargs) 1118 input = bw_hook.setup_input_hook(input) 1119 -> 1120 result = forward_call(input, *kwargs) 1121 if _global_forward_hooks or self._forward_hooks: 1122 for hook in (_global_forward_hooks.values(), *self._forward_hooks.values()):
~/autodl-fs/20231101_spikformer_cifar10/work/model.py in forward(self, x) 142 x = self.proj_conv(x.flatten(0, 1)) # have some fire value 143 x = self.proj_bn(x).reshape(T, B, -1, H, W).contiguous() --> 144 x = self.proj_lif(x).flatten(0, 1).contiguous() 145 146 x = self.proj_conv1(x)
~/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, *kwargs) 1118 input = bw_hook.setup_input_hook(input) 1119 -> 1120 result = forward_call(input, *kwargs) 1121 if _global_forward_hooks or self._forward_hooks: 1122 for hook in (_global_forward_hooks.values(), *self._forward_hooks.values()):
~/miniconda3/lib/python3.8/site-packages/spikingjelly/clock_driven/neuron.py in forward(self, xseq) 853 torch.fill(self.v, v_init) 854 --> 855 spike_seq, self.v_seq = neuron_kernel.MultiStepLIFNodePTT.apply( 856 x_seq.flatten(1), self.v.flatten(0), self.decay_input, self.tau, self.v_threshold, self.v_reset, self.detach_reset, self.surrogate_function.cuda_code) 857
~/miniconda3/lib/python3.8/site-packages/spikingjelly/clock_driven/neuron_kernel.py in forward(ctx, x_seq, v_last, decay_input, tau, v_threshold, v_reset, detach_reset, sg_cuda_code_fun) 755 kernel( 756 (blocks,), (threads,), --> 757 cu_kernel_opt.wrap_args_to_raw_kernel( 758 device, 759 *kernel_args
~/miniconda3/lib/python3.8/site-packages/spikingjelly/clock_driven/cu_kernel_opt.py in wrap_args_to_raw_kernel(device, *args) 62 63 elif isinstance(item, cupy.ndarray): ---> 64 assert item.device.id == device 65 assert item.flags['C_CONTIGUOUS'] 66 ret_list.append(item)
AssertionError:
请问我该如何解决呢?我环境中有装cupy
上面这个错误是在CPU上运行的吗
上面这个错误是在CPU上运行的吗
GPU环境下运行的
ssh://u21b961009@10.251.171.6:23735/opt/conda/envs/py37/bin/python -u /home/u21b961009/jupyterlab/Spikingformer-master/imagenet/train.py INFO:train:Training with a single process on 1 GPUs. Training with a single process on 1 GPUs. Traceback (most recent call last): File "/home/u21b961009/jupyterlab/Spikingformer-master/imagenet/train.py", line 824, in
main()
File "/home/u21b961009/jupyterlab/Spikingformer-master/imagenet/train.py", line 376, in main
drop_block_rate=None,
File "/home/u21b961009/.local/lib/python3.7/site-packages/timm/models/factory.py", line 71, in create_model
model = create_fn(pretrained=pretrained, pretrained_cfg=pretrained_cfg, kwargs)
File "/home/u21b961009/jupyterlab/Spikingformer-master/imagenet/model.py", line 256, in Spikingformer
kwargs
File "/home/u21b961009/jupyterlab/Spikingformer-master/imagenet/model.py", line 197, in init
embed_dims=embed_dims)
File "/home/u21b961009/jupyterlab/Spikingformer-master/imagenet/model.py", line 132, in init
self.proj1_lif = MultiStepLIFNode(tau=2.0, detach_reset=True, backend='cupy')
File "/home/u21b961009/.local/lib/python3.7/site-packages/spikingjelly/clock_driven/neuron.py", line 823, in init
check_backend(backend)
File "/home/u21b961009/.local/lib/python3.7/site-packages/spikingjelly/clock_driven/neuron.py", line 30, in check_backend
assert cupy is not None, 'CuPy is not installed! You can install it from "https://github.com/cupy/cupy".'
AssertionError: CuPy is not installed! You can install it from "https://github.com/cupy/cupy".
进程已结束,退出代码1
这个问题该怎么解决?重装了好多次cupy都不想,按照pip install cupy-cuda11x 也不管用