zhouchenlin2096 / Spikingformer

Spikingformer: Spike-driven Residual Learning for Transformer-based Spiking Neural Network
Apache License 2.0
76 stars 5 forks source link

AssertionError: CuPy is not installed! #1

Closed pangyanhua closed 11 months ago

pangyanhua commented 1 year ago

ssh://u21b961009@10.251.171.6:23735/opt/conda/envs/py37/bin/python -u /home/u21b961009/jupyterlab/Spikingformer-master/imagenet/train.py INFO:train:Training with a single process on 1 GPUs. Training with a single process on 1 GPUs. Traceback (most recent call last): File "/home/u21b961009/jupyterlab/Spikingformer-master/imagenet/train.py", line 824, in main() File "/home/u21b961009/jupyterlab/Spikingformer-master/imagenet/train.py", line 376, in main drop_block_rate=None, File "/home/u21b961009/.local/lib/python3.7/site-packages/timm/models/factory.py", line 71, in create_model model = create_fn(pretrained=pretrained, pretrained_cfg=pretrained_cfg, kwargs) File "/home/u21b961009/jupyterlab/Spikingformer-master/imagenet/model.py", line 256, in Spikingformer kwargs File "/home/u21b961009/jupyterlab/Spikingformer-master/imagenet/model.py", line 197, in init embed_dims=embed_dims) File "/home/u21b961009/jupyterlab/Spikingformer-master/imagenet/model.py", line 132, in init self.proj1_lif = MultiStepLIFNode(tau=2.0, detach_reset=True, backend='cupy') File "/home/u21b961009/.local/lib/python3.7/site-packages/spikingjelly/clock_driven/neuron.py", line 823, in init check_backend(backend) File "/home/u21b961009/.local/lib/python3.7/site-packages/spikingjelly/clock_driven/neuron.py", line 30, in check_backend assert cupy is not None, 'CuPy is not installed! You can install it from "https://github.com/cupy/cupy".' AssertionError: CuPy is not installed! You can install it from "https://github.com/cupy/cupy".

进程已结束,退出代码1

这个问题该怎么解决?重装了好多次cupy都不想,按照pip install cupy-cuda11x 也不管用

zhouchenlin2096 commented 1 year ago

1) What is the error during the installation of "cupy"? 2) please check your torch / cuda version & spikingjelly version? 3) try the "torch" backend in replacement of "cupy", which may cause the training to be slower.

qian26 commented 1 year ago

我也遇到了上面的问题,无论怎样安装CuPy,无论是pip还是conda,运行代码时都会提示“CuPy is not installed! You can install it from "https://github.com/cupy/cupy.” 但我单独新建问题进行import cupy进行测试是没有问题的,请问该怎么解决?

yult0821 commented 1 year ago

You can refer to a similar issue in SpikingJelly which may help to solve this problem. https://github.com/fangwei123456/spikingjelly/issues/243

qian26 commented 1 year ago

Thanks. I have already referred to the web page you provided.

"conda list torch" and "conda list cupy" are fine, and "import cupy" is fine, but the code still says 'CuPy is not installed! You can install it from "https://github.com/cupy/cupy".'

I'm now using "torch" as the backbend, which is much slower, and it took me nearly 12 hours to run 70 Epochs, My code set 400 epochs. I think my computer will probably be tired to death (sad (god bless my computer and me

jkhu29 commented 1 year ago

可能是没有下载tensorboard,见 https://github.com/fangwei123456/spikingjelly/blob/0.0.0.0.12/spikingjelly/clock_driven/cu_kernel_opt.py#L9C1-L10C1、 在0.0.0.0.12版本下import cupy的同时需要import tensorboard,0.0.0.0.14似乎修复了这个问题

Castrol68 commented 1 year ago

我也遇到了这个问题,请问有解决方法么

fangwei123456 commented 1 year ago

装最新版框架试试?@Castrol68

touristourist commented 12 months ago

装最新版框架试试?@Castrol68

Hi, I've got a question about the CuPy acceleration impact on Spikingformer. When using the torch backend, each iter takes ~0.71 secs, while with CuPy, it takes 0.57 secs, resulting in ~20% reduction in training time.

I wonder whether this acceleration ratio is normal, considering that the acceleration impact of CuPy in the tutorial is extremely significant.

fangwei123456 commented 12 months ago

@touristourist Hi, it depends on T, the number of time-steps. You can try different T. If you use a small T, then the acceleration ratio is also small.

touristourist commented 12 months ago

@touristourist Hi, it depends on T, the number of time-steps. You can try different T. If you use a small T, then the acceleration ratio is also small.

Alright, got it! So, I understand that different timesteps do have an impact on the acceleration ratio. However, the chart in the tutorial shows that when the timestep is 8, the forward and backward processes can speed up by approximately 5 times (8.13/1.65), which is quite a significant difference compared to what I'm experiencing (only a 20% time reduction). Could it be that the chart specifically shows the acceleration effect only on neurons, while operations like convolution and linear cannot be accelerated with CuPy? As a newcomer to spikingjelly, I'm eagerly awaiting your response. Thanks!

fangwei123456 commented 12 months ago

Could it be that the chart specifically shows the acceleration effect only on neurons, while operations like convolution and linear cannot be accelerated with CuPy.

Yes, and the acceleration ratio is smaller than a single neuron layer.

ShaopengLu commented 11 months ago

我也遇到了上面的问题,无论怎样安装CuPy,无论是pip还是conda,运行代码时都会提示“CuPy is not installed! You can install it from "https://github.com/cupy/cupy.” 但我单独新建问题进行import cupy进行测试是没有问题的,请问该怎么解决?

你好,我也遇到了同样的问题,请问这个问题你解决了没有

fangwei123456 commented 11 months ago

试试jkhu29的评论中的解决方法?

ShaopengLu commented 11 months ago

fangwei123456

Traceback (most recent call last): File "D:\Postgraduate\code\Spikingformer-master\imagenet\model.py", line 261, in model = create_model( File "F:\anaconda\envs\pytorch\lib\site-packages\timm\models\factory.py", line 71, in create_model model = create_fn(pretrained=pretrained, pretrained_cfg=pretrained_cfg, **kwargs) File "D:\Postgraduate\code\Spikingformer-master\imagenet\model.py", line 251, in Spikingformer model = vit_snn( File "D:\Postgraduate\code\Spikingformer-master\imagenet\model.py", line 194, in init patch_embed = SpikingTokenizer(img_size_h=img_size_h, File "D:\Postgraduate\code\Spikingformer-master\imagenet\model.py", line 132, in init self.proj1_lif = MultiStepLIFNode(tau=2.0, detach_reset=True, backend='cupy') File "F:\anaconda\envs\pytorch\lib\site-packages\spikingjelly\clock_driven\neuron.py", line 823, in init check_backend(backend) File "F:\anaconda\envs\pytorch\lib\site-packages\spikingjelly\clock_driven\neuron.py", line 30, in check_backend assert cupy is not None, 'CuPy is not installed! You can install it from "https://github.com/cupy/cupy".' AssertionError: CuPy is not installed! You can install it from "https://github.com/cupy/cupy". 你好,我尝试了还是报错。我看评论中有提到可以不用cupy,那我应该怎么进行呢?

fangwei123456 commented 11 months ago

不用cupy的话就把神经元的后端设置成torch

ShaopengLu commented 11 months ago

不用cupy的话就把神经元的后端设置成torch

你好,请问是把这些都改为torch吗? self.mlp1_lif = MultiStepLIFNode(tau=2.0, detach_reset=True, backend='cupy')就是类似于这种的改一下吗?还是怎么改?

fangwei123456 commented 11 months ago

是的,所有神经元设置 backend='torch'

ShaopengLu commented 11 months ago

是的,所有神经元设置 backend='torch'

你好,刚刚修改完是可以运行了。但是我在跑test.py时,遇到了这样的问题,请问这个你知道怎么解决吗? INFO:train:Training with a single process on 1 GPUs. Training with a single process on 1 GPUs. Creating model number of params: 29705768 INFO:train:Model vitsnn created, param count:29705768 Model vitsnn created, param count:29705768 INFO:timm.data.config:Data processing configuration for current model + dataset: Data processing configuration for current model + dataset: INFO:timm.data.config: input_size: (3, 224, 224) input_size: (3, 224, 224) INFO:timm.data.config: interpolation: bicubic interpolation: bicubic INFO:timm.data.config: mean: (0.485, 0.456, 0.406) mean: (0.485, 0.456, 0.406) INFO:timm.data.config: std: (0.229, 0.224, 0.225) std: (0.229, 0.224, 0.225) INFO:timm.data.config: crop_pct: 1.0 crop_pct: 1.0 INFO:train:Using native Torch AMP. Training in mixed precision. Using native Torch AMP. Training in mixed precision. ERROR:timm.models.helpers:No checkpoint found at '/media/data/spike-transformer-network/spikingformer_github/imagenet/output/train/Spikingformer_models/checkpoint-284.pth.tar' ERROR: No checkpoint found at '/media/data/spike-transformer-network/spikingformer_github/imagenet/output/train/Spikingformer_models/checkpoint-284.pth.tar' Traceback (most recent call last): File "D:\code\Spikingformer-master\imagenet\test.py", line 639, in main() File "D:\code\Spikingformer-master\imagenet\test.py", line 437, in main resume_epoch = resume_checkpoint( File "D:\anaconda\envs\g1\lib\site-packages\timm\models\helpers.py", line 113, in resume_checkpoint raise FileNotFoundError() FileNotFoundError

fangwei123456 commented 11 months ago

加载之前保存的权重失败,文件没找到。

liberary233 commented 11 months ago

我在调包计算FLOPs以及把模型转换成ONNX格式的过程中都遇到了以下报错,


AssertionError Traceback (most recent call last) /tmp/ipykernel_811/993365423.py in 5 batch_size = 1 6 input_shape = (batch_size, 3, 32, 32) ----> 7 flops, macs, params = calculate_flops(model=model, 8 input_shape=input_shape, 9 output_as_string=True,

~/miniconda3/lib/python3.8/site-packages/calflops/flops_counter.py in calculate_flops(model, input_shape, transformer_tokenizer, args, kwargs, forward_mode, include_backPropagation, compute_bp_factor, print_results, print_detailed, output_as_string, output_precision, output_unit, ignore_modules) 163 164 if forwardmode == 'forward': --> 165 = model(args) 166 if forwardmode == 'generate': 167 = model.generate(args)

~/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, *kwargs) 1118 input = bw_hook.setup_input_hook(input) 1119 -> 1120 result = forward_call(input, *kwargs) 1121 if _global_forward_hooks or self._forward_hooks: 1122 for hook in (_global_forward_hooks.values(), *self._forward_hooks.values()):

~/autodl-fs/20231101_spikformer_cifar10/work/model.py in forward(self, x) 231 def forward(self, x): 232 x = (x.unsqueeze(0)).repeat(self.T, 1, 1, 1, 1) --> 233 x = self.forward_features(x) 234 x = self.head(x.mean(0)) 235 return x

~/autodl-fs/20231101_spikformer_cifar10/work/model.py in forward_features(self, x) 224 patch_embed = getattr(self, f"patch_embed") 225 --> 226 x = patch_embed(x) 227 for blk in block: 228 x = blk(x)

~/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, *kwargs) 1118 input = bw_hook.setup_input_hook(input) 1119 -> 1120 result = forward_call(input, *kwargs) 1121 if _global_forward_hooks or self._forward_hooks: 1122 for hook in (_global_forward_hooks.values(), *self._forward_hooks.values()):

~/autodl-fs/20231101_spikformer_cifar10/work/model.py in forward(self, x) 142 x = self.proj_conv(x.flatten(0, 1)) # have some fire value 143 x = self.proj_bn(x).reshape(T, B, -1, H, W).contiguous() --> 144 x = self.proj_lif(x).flatten(0, 1).contiguous() 145 146 x = self.proj_conv1(x)

~/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, *kwargs) 1118 input = bw_hook.setup_input_hook(input) 1119 -> 1120 result = forward_call(input, *kwargs) 1121 if _global_forward_hooks or self._forward_hooks: 1122 for hook in (_global_forward_hooks.values(), *self._forward_hooks.values()):

~/miniconda3/lib/python3.8/site-packages/spikingjelly/clock_driven/neuron.py in forward(self, xseq) 853 torch.fill(self.v, v_init) 854 --> 855 spike_seq, self.v_seq = neuron_kernel.MultiStepLIFNodePTT.apply( 856 x_seq.flatten(1), self.v.flatten(0), self.decay_input, self.tau, self.v_threshold, self.v_reset, self.detach_reset, self.surrogate_function.cuda_code) 857

~/miniconda3/lib/python3.8/site-packages/spikingjelly/clock_driven/neuron_kernel.py in forward(ctx, x_seq, v_last, decay_input, tau, v_threshold, v_reset, detach_reset, sg_cuda_code_fun) 755 kernel( 756 (blocks,), (threads,), --> 757 cu_kernel_opt.wrap_args_to_raw_kernel( 758 device, 759 *kernel_args

~/miniconda3/lib/python3.8/site-packages/spikingjelly/clock_driven/cu_kernel_opt.py in wrap_args_to_raw_kernel(device, *args) 62 63 elif isinstance(item, cupy.ndarray): ---> 64 assert item.device.id == device 65 assert item.flags['C_CONTIGUOUS'] 66 ret_list.append(item)

AssertionError:

请问我该如何解决呢?我环境中有装cupy

fangwei123456 commented 11 months ago

上面这个错误是在CPU上运行的吗

liberary233 commented 11 months ago

上面这个错误是在CPU上运行的吗

GPU环境下运行的