open-mmlab / mmtracking

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.
https://mmtracking.readthedocs.io/en/latest/
Apache License 2.0
3.52k stars 591 forks source link

when I try to run train.py ,there is a error which RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory #297

Open ilmoney opened 2 years ago

ilmoney commented 2 years ago

This problem has bothered me for a long time. The specific error is this: Traceback (most recent call last): File "tools/train.py", line 192, in main() File "tools/train.py", line 166, in main model.init_weights() File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/runner/base_module.py", line 117, in init_weights m.init_weights() File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/runner/base_module.py", line 117, in init_weights m.init_weights() File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/runner/base_module.py", line 106, in init_weights initialize(self, self.init_cfg) File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/cnn/utils/weight_init.py", line 612, in initialize _initialize(module, cp_cfg) File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/cnn/utils/weight_init.py", line 517, in _initialize func(module) File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/cnn/utils/weight_init.py", line 494, in call logger=logger) File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 513, in load_checkpoint checkpoint = _load_checkpoint(filename, map_location, logger) File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 451, in _load_checkpoint return CheckpointLoader.load_checkpoint(filename, map_location, logger) File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 244, in load_checkpoint return checkpoint_loader(filename, map_location) File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 371, in load_from_torchvision return load_from_http(model_urls[model_name], map_location=map_location) File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 284, in load_from_http filename, model_dir=model_dir, map_location=map_location) File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/torch/hub.py", line 575, in load_state_dict_from_url return torch.load(cached_file, map_location=map_location) File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/torch/serialization.py", line 600, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/torch/serialization.py", line 242, in init super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

i think there are problems in weights file ,but i have download .pth in checkpoints file ,i can not solve it . please help me I executed this command python tools/train.py configs/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid.py

GT9505 commented 2 years ago

Hi, could you provide the environment? And which checkpoint did you download?

ilmoney commented 2 years ago

pyhtorch 1.9 和cuda11.1 resnext101_64x4d-ee2c6f71.pth

ilmoney commented 2 years ago

I also encountered a problem recently: RuntimeError: cannot reshape tensor of 0 elements into shape [0, 16, -1] because the unspecified dimension size -1 can be any value and is ambiguous this is datasets problem?

GT9505 commented 2 years ago

pyhtorch 1.9 和cuda11.1 resnext101_64x4d-ee2c6f71.pth

The error information reminds you that torch.load() cann't load resnext101 weights in your machine. However, I have successfully used torch.load() to load the resnext101 weights with pytorch 1.9 in our machine Could you try downloading the resnext101 weights again, and see whether using torch.load() could load the weights?

GT9505 commented 2 years ago

I also encountered a problem recently: RuntimeError: cannot reshape tensor of 0 elements into shape [0, 16, -1] because the unspecified dimension size -1 can be any value and is ambiguous this is datasets problem?

Could you use the template of error-reported to report the bug. It is hard to provide useful help without the full error information.

ilmoney commented 2 years ago

i download this weights files in local,then it successes thank you

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmtracking" @.>; 发送时间: 2021年10月21日(星期四) 下午5:28 @.>; @.**@.>; 主题: Re: [open-mmlab/mmtracking] when I try to run train.py ,there is a error which RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory (#297)

pyhtorch 1.9 和cuda11.1 resnext101_64x4d-ee2c6f71.pth

The error information reminds you that torch.load() cann't load resnext101 weights in your machine. However, I have successfully used torch.load() to load the resnext101 weights with pytorch 1.9 in our machine Could you try downloading the resnext101 weights again, and see whether using `torch.load() could load the weights?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

ilmoney commented 2 years ago

the problem is :

 File "tools/train.py", line 192, in <module>     main()   File "tools/train.py", line 188, in main     meta=meta)   File "/home/featurize/Gao/mmtracking-master/mmtrack/apis/train.py", line 137, in train_model     runner.run(data_loaders, cfg.workflow, cfg.total_epochs)

  File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run     epoch_runner(data_loaders[i], kwargs)   File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train     self.run_iter(data_batch, train_mode=True, kwargs)   File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter     kwargs)   File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step     return self.module.train_step(inputs[0], kwargs[0])   File "/home/featurize/Gao/mmtracking-master/mmtrack/models/vid/base.py", line 265, in train_step     losses = self(data)   File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl     return forward_call(input, kwargs)   File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func     return old_func(*args, kwargs)   File "/home/featurize/Gao/mmtracking-master/mmtrack/models/vid/base.py", line 194, in forward     kwargs)   File "/home/featurize/Gao/mmtracking-master/mmtrack/models/vid/selsa.py", line 166, in forward_train     gt_labels, gt_bboxes_ignore, gt_masks, kwargs)   File "/home/featurize/Gao/mmtracking-master/mmtrack/models/roi_heads/selsa_roi_head.py", line 66, in forward_train     gt_bboxes, gt_labels)   File "/home/featurize/Gao/mmtracking-master/mmtrack/models/roi_heads/selsa_roi_head.py", line 104, in _bbox_forward_train     bbox_results = self._bbox_forward(x, ref_x, rois, ref_rois)   File "/home/featurize/Gao/mmtracking-master/mmtrack/models/roi_heads/selsa_roi_head.py", line 93, in _bbox_forward     cls_score, bbox_pred = self.bbox_head(bbox_feats, ref_bbox_feats)   File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl     return forward_call(*input, *kwargs)   File "/home/featurize/Gao/mmtracking-master/mmtrack/models/roi_heads/bbox_heads/selsa_bbox_head.py", line 57, in forward     x = x + self.aggregator[i](x, ref_x)   File "/environment/python/versions/miniconda3-4.7.12/envs/mymmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl     return forward_call(input, kwargs)   File "/home/featurize/Gao/mmtracking-master/mmtrack/models/aggregators/selsa_aggregator.py", line 62, in forward     -1).permute(1, 2, 0) RuntimeError: cannot reshape tensor of 0 elements into shape [0, 16, -1] because the unspecified dimension size -1 can be any value and is ambiguous

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmtracking" @.>; 发送时间: 2021年10月21日(星期四) 下午5:31 @.>; @.**@.>; 主题: Re: [open-mmlab/mmtracking] when I try to run train.py ,there is a error which RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory (#297)

I also encountered a problem recently: RuntimeError: cannot reshape tensor of 0 elements into shape [0, 16, -1] because the unspecified dimension size -1 can be any value and is ambiguous this is datasets problem? Could you use the template of error-reported to report the bug. It is hard to provide useful help without the full error information.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

GT9505 commented 2 years ago

The reason is that there are no proposals in reference images (ref_roi_n==0).

You can change -1 to C / self.num_attention_blocks in line 57, 62, 71 of selsa_aggregator.py, and change -1 to C in line 76. These changes will allow the selsa_aggregator forward with the number of proposals equaling to 0.

It would be appreciated if you could create a PR to fix the bug.

ilmoney commented 2 years ago

Is it possible that it is because of the problem of the dataset? because I run imagenet vid with a small number of pictures, and no error was reported

I am not sure if it is because the number of datasets is too large, which will cause calculation errors, code errors or data set errors bother me

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmtracking" @.>; 发送时间: 2021年10月22日(星期五) 下午4:56 @.>; @.**@.>; 主题: Re: [open-mmlab/mmtracking] when I try to run train.py ,there is a error which RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory (#297)

The reason is that there are no proposals in reference images (ref_roi_n==0).

You can change -1 to C / self.num_attention_blocks in line 57, 62, 71 of selsa_aggregator.py, and change -1 to C in line 76. These changes will fix the bug.

It would be appreciated if you could create a PR to fix the bug.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

ilmoney commented 2 years ago

excuse me ,I also have a question,the C is what

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmtracking" @.>; 发送时间: 2021年10月22日(星期五) 下午4:56 @.>; @.**@.>; 主题: Re: [open-mmlab/mmtracking] when I try to run train.py ,there is a error which RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory (#297)

The reason is that there are no proposals in reference images (ref_roi_n==0).

You can change -1 to C / self.num_attention_blocks in line 57, 62, 71 of selsa_aggregator.py, and change -1 to C in line 76. These changes will fix the bug.

It would be appreciated if you could create a PR to fix the bug.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

ilmoney commented 2 years ago

I tried to print x, the result of x is tensor, and the element is NaN, so whether I breathe is a dataset problem, but I am not sure

I have been troubled by this problem for three days. If you can help me solve it, I will be very grateful.

------------------ 原始邮件 ------------------ 发件人: "open-mmlab/mmtracking" @.>; 发送时间: 2021年10月22日(星期五) 下午4:56 @.>; @.**@.>; 主题: Re: [open-mmlab/mmtracking] when I try to run train.py ,there is a error which RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory (#297)

The reason is that there are no proposals in reference images (ref_roi_n==0).

You can change -1 to C / self.num_attention_blocks in line 57, 62, 71 of selsa_aggregator.py, and change -1 to C in line 76. These changes will fix the bug.

It would be appreciated if you could create a PR to fix the bug.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

GT9505 commented 2 years ago

C denotes the channel of feature map. You need to check the input images and intermediate results, since the element of x is NaN. Maybe the input image is NaN when the x is NaN?