open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.17k stars 1.22k forks source link

[Bug] Why does the RawframeDataset training script cancel in the new version 1.x? #2473

Closed KunLiam closed 1 year ago

KunLiam commented 1 year ago

Branch

main branch (1.x version, such as v1.0.0, or dev-1.x branch)

Prerequisite

Environment

The environment required for the updated version is installed.

Describe the bug

Because your updated version is not friendly to the rgb frame data set, I was very comfortable using it in the last version, so now I can only modify the configuration file to train my own rgb frame data, so there are many problems. I mentioned an issue before, Besides, I solved the running problem of the code 'uniformerv2-base-p16-res224_clip-kinetics710-pre_u8_kinetics400-rgb.py', but I had to modify other codes one by one. After modification, different error messages would appear one after another. I am very upset, I would like to suggest that you can give each configs a training configuration file corresponding to an rgb frame, just like the previous version 0.x. This will be very friendly to those with rgb frames in the data set. I would be very grateful if you could take my advice! The following is my modified configuration file and several error messages listed:

Modified related configuration files: [# dataset settings dataset_type = 'RawframeDataset' data_root = 'data/US_test/frames/train' data_root_val = 'data/US_test/frames/val' ann_file_train = 'data/US_test/train_class2_rawframes.txt' ann_file_val = 'data/US_test/val_class2_rawframes.txt' ann_file_test = 'data/US_test/val_class2_rawframes.txt'

file_client_args = dict(io_backend='disk')

sthv1_flip_label_map = {2: 4, 4: 2, 30: 41, 41: 30, 52: 66, 66: 52} train_pipeline = [ dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), dict(type='RawFrameDecode', file_client_args), dict(type='Resize', scale=(-1, 256)), dict( type='MultiScaleCrop', input_size=224, scales=(1, 0.875, 0.75, 0.66), random_crop=False, max_wh_scale_gap=1, num_fixed_crops=13), dict(type='Resize', scale=(224, 224), keep_ratio=False), dict(type='Flip', flip_ratio=0.5, flip_label_map=sthv1_flip_label_map), dict(type='FormatShape', input_format='NCTHW'), dict(type='PackActionInputs') ] val_pipeline = [ dict( type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8, test_mode=True), dict(type='RawFrameDecode', file_client_args), dict(type='Resize', scale=(-1, 256)), dict(type='CenterCrop', crop_size=224), dict(type='FormatShape', input_format='NCTHW'), dict(type='PackActionInputs') ] test_pipeline = [ dict( type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8, twice_sample=True, test_mode=True), dict(type='RawFrameDecode', **file_client_args), dict(type='Resize', scale=(-1, 256)), dict(type='ThreeCrop', crop_size=224), dict(type='FormatShape', input_format='NCTHW'), dict(type='PackActionInputs') ] train_dataloader = dict( batch_size=14, num_workers=8, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), dataset=dict( type=dataset_type, ann_file=ann_file_train, data_prefix=dict(img=data_root), filenametmpl='img{:05}.jpg', pipeline=train_pipeline)) val_dataloader = dict( batch_size=14, num_workers=8, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, ann_file=ann_file_val, data_prefix=dict(img=data_root_val), filenametmpl='img{:05}.jpg', pipeline=val_pipeline, test_mode=True)) test_dataloader = dict( batch_size=8, num_workers=8, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, ann_file=ann_file_val, data_prefix=dict(img=data_root_val), filenametmpl='img{:05}.jpg', pipeline=test_pipeline, test_mode=True))]

Run # 1: [configs/recognition/tpn/tpn-slowonly_imagenet-pretrained-r50_8xb8-8x8x1-150e_kinetics400-rgb.py] Mistake # 1: [Traceback (most recent call last): File "./tools/train.py", line 136, in main() File "./tools/train.py", line 132, in main runner.train() File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1706, in train model = self.train_loop.run() # type: ignore File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run self.run_epoch() File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch self.run_iter(idx, data_batch) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter outputs = self.runner.model.train_step( File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 121, in train_step losses = self._run_forward(data, mode='loss') File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 161, in _run_forward results = self(data, mode=mode) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward output = self._run_ddp_forward(inputs, kwargs) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward return module_to_run(*inputs[0], kwargs[0]) # type: ignore[index] File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/recognizers/base.py", line 223, in forward return self.loss(inputs, data_samples, kwargs) File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/recognizers/base.py", line 132, in loss self.extract_feat(inputs, File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/recognizers/recognizer3d.py", line 104, in extract_feat x, loss_aux = self.neck(x, data_samples=data_samples) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/necks/tpn.py", line 444, in forward loss_aux = self.aux_head.loss(x[-2], data_samples) File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/necks/tpn.py", line 264, in loss losses['loss_aux'] = self.loss_weight self.loss_cls(x, labels) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, *kwargs) File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/losses/base.py", line 39, in forward ret = self._forward(args, kwargs) File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/losses/cross_entropy_loss.py", line 81, in _forward loss_cls = F.cross_entropy(cls_score, label, kwargs) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/functional.py", line 3029, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) ValueError: Expected input batch_size (256) to match target batch_size (32)]

Run # 2: [configs/recognition/c3d/c3d_sports1m-pretrained_8xb30-16x1x1-45e_ucf101-rgb.py] Mistake # 2: [Traceback (most recent call last): File "./tools/train.py", line 136, in main() File "./tools/train.py", line 132, in main runner.train() File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1706, in train model = self.train_loop.run() # type: ignore File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run self.run_epoch() File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch self.run_iter(idx, data_batch) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter outputs = self.runner.model.train_step( File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 121, in train_step losses = self._run_forward(data, mode='loss') File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 161, in _run_forward results = self(data, mode=mode) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward output = self._run_ddp_forward(inputs, kwargs) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward return module_to_run(*inputs[0], kwargs[0]) # type: ignore[index] File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/recognizers/base.py", line 223, in forward return self.loss(inputs, data_samples, kwargs) File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/recognizers/base.py", line 132, in loss self.extract_feat(inputs, File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/recognizers/recognizer3d.py", line 98, in extract_feat x = self.backbone(inputs) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/backbones/c3d.py", line 137, in forward x = self.pool5(x) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/pooling.py", line 244, in forward return F.max_pool3d(input, self.kernel_size, self.stride, File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/_jit_internal.py", line 484, in fn return if_false(args, **kwargs) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/functional.py", line 868, in _max_pool3d return torch.max_pool3d(input, kernel_size, stride, padding, dilation, ceil_mode) RuntimeError: Given input size: (512x1x14x14). Calculated output size: (512x0x8x8). Output size is too small]

Run # 3: [configs/recognition/uniformer/uniformer-base_imagenet1k-pre_16x4x1_kinetics400-rgb.py] Mistake # 3: [Traceback (most recent call last): File "./tools/train.py", line 136, in main() File "./tools/train.py", line 132, in main runner.train() File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1706, in train model = self.train_loop.run() # type: ignore File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run self.run_epoch() File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch self.run_iter(idx, data_batch) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter outputs = self.runner.model.train_step( File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 121, in train_step losses = self._run_forward(data, mode='loss') File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 161, in _run_forward results = self(data, mode=mode) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward output = self._run_ddp_forward(inputs, kwargs) File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward return module_to_run(*inputs[0], *kwargs[0]) # type: ignore[index] File "/home/LK/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/recognizers/base.py", line 223, in forward return self.loss(inputs, data_samples, kwargs) File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/recognizers/base.py", line 137, in loss loss_cls = self.cls_head.loss(feats, data_samples, **loss_kwargs) File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/heads/base.py", line 100, in loss return self.loss_by_feat(cls_scores, data_samples) File "/data/LK/Video_Processing/mmaction2-main/mmaction/models/heads/base.py", line 130, in loss_by_feat top_k_acc = top_k_accuracy(cls_scores.detach().cpu().numpy(), File "/data/LK/Video_Processing/mmaction2-main/mmaction/evaluation/functional/accuracy.py", line 149, in top_k_accuracy match_array = np.logical_or.reduce(max_k_preds == labels, axis=1) numpy.AxisError: axis 1 is out of bounds for array of dimension 0]

Of course, I only listed three typical problems, other configuration files will have their own problems, such a code makes us use rgb frame data set is too headache, sincerely hope that you can and the previous version of 0.x to make a corresponding, can train both rgb frame data and video data, so that there will be no so many problems, thank you!

Reproduces the problem - code sample

No response

Reproduces the problem - command or script

No response

Reproduces the problem - error message

No response

Additional information

No response

cir7 commented 1 year ago

I think that the bugs are not just a problem with VideoDataset to RawFrameDataset, we provide a quick tutorial about how to transform a video data config to a frame data config.
I strongly recommend taking a look at the quick guide about MMAction2, which may help you understand the codebase.

KunLiam commented 1 year ago

I think that the bugs are not just a problem with VideoDataset to RawFrameDataset, we provide a quick tutorial about how to transform a video data config to a frame data config. I strongly recommend taking a look at the quick guide about MMAction2, which may help you understand the codebase.

Thank you, I did it! But you still need to modify the following parts of the document, modify data_prefix=dict(video=data_root) to data_prefix=dict(img=data_root) in train_dataloader/test_dataloader/val_dataloader,add file_ client_ args = dict(io_backend='disk') to the configuration.

cir7 commented 1 year ago

Thanks for your feedback, we will update the tutorial