Fine Tune a SpatioTemporal Action Detection model on a custom dataset in AVA format

open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

https://mmaction2.readthedocs.io

Apache License 2.0

4.15k stars 1.22k forks source link

Fine Tune a SpatioTemporal Action Detection model on a custom dataset in AVA format #2578

Open damianozappia opened 1 year ago

damianozappia commented 1 year ago

The doc issue

Hi, can someone show me how to fine tune a model for Spatio-Temporal Action Detection with a custom AVA dataset with (in my case) 6 classes?

I modified the config file by changing the number of classes here:

bbox_head=dict(
            type='BBoxHeadAVA',
            in_channels=2304,
            num_classes=7, # from 80+1 of AVA to 6+1 of the custom dataset
            multilabel=True,
            dropout_ratio=0.5)),

and specifing the model to load for fine tuning in the load_from parameter.

However I get the following error when staring the train.py script:

The model and loaded state dict do not match exactly

size mismatch for roi_head.bbox_head.fc_cls.weight: copying a param with shape torch.Size([81, 2304]) from checkpoint, the shape in current model is torch.Size([7, 2304]).
size mismatch for roi_head.bbox_head.fc_cls.bias: copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([7]).

Suggest a potential alternative/fix

No response

cir7 commented 1 year ago

The warning message is as expected and it's not an error message. The cls_head related weights for your custom dataset is different from the original weights for K400. you can continue training if it is not interrupted.

damianozappia commented 1 year ago

Thanks @cir7 for your reply. Unfortunately the training is interrupted because of this classes mismatch, indeed I get the error: RuntimeError: The size of tensor a (6) must match the size of tensor b (80) at non-singleton dimension 1

Here is my config file if it can be helpful:

https://www.dropbox.com/s/l00u26jduwz9or6/slowfast_kinetics400-pretrained-r50_8xb6-8x8x1-cosine-10e_ava22-rgb%20%281%29.py?dl=0

From the documentation it's a bit unclear how to setup fine tuning in case of Spatio-Temporal model, as I thought it was the same as the Action Recognition tutorial were as showed in the guide you have to change num_classes in the cls_head dict, but this field doesn't exist in the Spatio-Temporal models.

Can you please explain me how to set it up in order to fine tune a pretrained SlowFast model on my current dataset?

cir7 commented 1 year ago

custom action detection dataset requires specifying num_classes in AVADataset, please check it.

Yizhao-AwakeAI commented 3 weeks ago

change the mmaction/models/roi_heads/bbox_heads/bbox_head.py. Add these 2 lines after the row 244, which can change the gt's class number from 81 to 7 for sampling_result in sampling_results: sampling_result.pos_gt_labels = sampling_result.pos_gt_labels[:, :self.num_classes]