zju3dv / snake

Code for "Deep Snake for Real-Time Instance Segmentation" CVPR 2020 oral
Other
1.16k stars 229 forks source link

对于训练有些问题。希望您能给点意见 #66

Closed zhenghan408 closed 4 years ago

zhenghan408 commented 4 years ago

重新安装了几个版本的cuda了,还是会出现如下的错误:

RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:425

希望您能给一点意见

pengsida commented 4 years ago

请列出完整报错信息。

zhenghan408 commented 4 years ago

loading annotations into memory... Done (t=0.00s) creating index... index created! WARNING: NO MODEL LOADED !!! loading annotations into memory... Done (t=0.00s) creating index... index created! loading annotations into memory... Done (t=0.00s) creating index... index created! THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument Traceback (most recent call last): File "train_net.py", line 54, in main() File "train_net.py", line 50, in main train(cfg, network) File "train_net.py", line 25, in train trainer.train(epoch, train_loader, optimizer, recorder) File "/home/wwx/zhenghan/snake-master/lib/train/trainers/trainer.py", line 38, in train output, loss, loss_stats, image_stats = self.network(batch) File "/home/wwx/anaconda3/envs/snake/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/home/wwx/anaconda3/envs/snake/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], *kwargs[0]) File "/home/wwx/anaconda3/envs/snake/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "lib/train/trainers/snake.py", line 19, in forward output = self.net(batch['inp'], batch) File "/home/wwx/anaconda3/envs/snake/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "lib/networks/snake/ct_snake.py", line 54, in forward output, cnn_feature = self.dla(x) File "/home/wwx/anaconda3/envs/snake/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "lib/networks/snake/dla.py", line 470, in forward x = self.dla_up(x) File "/home/wwx/anaconda3/envs/snake/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "lib/networks/snake/dla.py", line 409, in forward ida(layers, len(layers) - i - 2, len(layers)) File "/home/wwx/anaconda3/envs/snake/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "lib/networks/snake/dla.py", line 383, in forward layers[i] = upsample(project(layers[i])) File "/home/wwx/anaconda3/envs/snake/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "lib/networks/snake/dla.py", line 356, in forward x = self.conv(x) File "/home/wwx/anaconda3/envs/snake/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/home/wwx/zhenghan/snake-master/lib/networks/dcn_v2.py", line 128, in forward self.deformable_groups) File "/home/wwx/zhenghan/snake-master/lib/networks/dcn_v2.py", line 31, in forward ctx.deformable_groups) RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:425

pengsida commented 4 years ago

你读入的json文件好像是空的。

zhenghan408 commented 4 years ago

├── /path/to/kitti │ ├── testing │ │ ├── image_2 │ │ ├── instance_val.json │ ├── training │ │ ├── image_2 │ │ ├── instance_train.json

您好,我用的这个格式:在data下创建kikki,之后在kikki立面创建testing以及training放入相应的图片和.json文件,最后用ln -s /path/to/kitti kitti进行链接。 不知道是哪儿出了问题呢?

pengsida commented 4 years ago

建议自己检查一下。

zhenghan408 commented 4 years ago

这其中的image_2可以是文件夹吗?

pengsida commented 4 years ago

image_2是存放图片的文件夹。

Ivan-Fan commented 4 years ago

@pengsida 作者您好,我也遇到过这个问题,冒昧地猜测是不是官网上kitti数据集存放说明的问题?INSTALL.md文档里介绍的是instance_val.json,但是实际上代码里默认的文件名是instances_val.json,所以可能存在不能读出json的问题。

pengsida commented 4 years ago

你说的对,我这里打错了,抱歉。

pengsida commented 4 years ago

我已经改了这个typo: image

Ivan-Fan commented 4 years ago

@pengsida 作者大大辛苦了~

yyting111 commented 3 years ago

自己制作了coco2017格式的数据集进行训练,然后出现了以下报错,想问一下是那里出现了问题?

(snake) liuxing@liuxing-ThinkStation-P720:~/snake-master$ sh train.sh loading annotations into memory... Done (t=0.02s) creating index... index created! WARNING: NO MODEL LOADED !!! loading annotations into memory... Done (t=0.01s) creating index... index created! loading annotations into memory... Done (t=0.01s) creating index... index created! Traceback (most recent call last): File "train_net.py", line 54, in main() File "train_net.py", line 50, in main train(cfg, network) File "train_net.py", line 25, in train trainer.train(epoch, train_loader, optimizer, recorder) File "/home/liuxing/snake-master/lib/train/trainers/trainer.py", line 32, in train for iteration, batch in enumerate(data_loader): File "/home/liuxing/anaconda3/envs/snake/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in next return self._process_next_batch(batch) File "/home/liuxing/anaconda3/envs/snake/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch raise batch.exc_type(batch.exc_msg) AttributeError: Traceback (most recent call last): File "/home/liuxing/anaconda3/envs/snake/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/liuxing/anaconda3/envs/snake/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in samples = collate_fn([dataset[i] for i in batch_indices]) File "lib/datasets/coco/snake.py", line 139, in getitem height, width = img.shape[0], img.shape[1] AttributeError: 'NoneType' object has no attribute 'shape'

yyting111 commented 3 years ago

用以下命令也是报相同的错误。

python -W ignore train_net.py --cfg_file ./configs/coco_snake.yaml

singing4you commented 3 years ago

自己制作了coco2017格式的数据集进行训练,然后出现了以下报错,想问一下是那里出现了问题?

(snake) liuxing@liuxing-ThinkStation-P720:~/snake-master$ sh train.sh loading annotations into memory... Done (t=0.02s) creating index... index created! WARNING: NO MODEL LOADED !!! loading annotations into memory... Done (t=0.01s) creating index... index created! loading annotations into memory... Done (t=0.01s) creating index... index created! Traceback (most recent call last): File "train_net.py", line 54, in main() File "train_net.py", line 50, in main train(cfg, network) File "train_net.py", line 25, in train trainer.train(epoch, train_loader, optimizer, recorder) File "/home/liuxing/snake-master/lib/train/trainers/trainer.py", line 32, in train for iteration, batch in enumerate(data_loader): File "/home/liuxing/anaconda3/envs/snake/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in next return self._process_next_batch(batch) File "/home/liuxing/anaconda3/envs/snake/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch raise batch.exc_type(batch.exc_msg) AttributeError: Traceback (most recent call last): File "/home/liuxing/anaconda3/envs/snake/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/liuxing/anaconda3/envs/snake/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in samples = collate_fn([dataset[i] for i in batch_indices]) File "lib/datasets/coco/snake.py", line 139, in getitem height, width = img.shape[0], img.shape[1] AttributeError: 'NoneType' object has no attribute 'shape'

你好 我也遇到了同样的报错,请问你解决了吗?