open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.28k stars 1.25k forks source link

Using BMN on new data #263

Closed orikorner closed 4 years ago

orikorner commented 4 years ago

I am trying to use a temporal action localization model on a new dataset to see if I get an output that makes some sense. for example the input can be a 1 min clip of a tennis match and if the serves are localized and classified as the same class that would be an output that makes sense.

My idea is to take a pretrained BMN model for proposal generating, a pretrained classifier head, a pretrained backbone (single stream) and train it further on a small amount of my data.

I am not sure how to achieve this. I read the existing docs but I am still confused about the composition of localization and recognition models into a single entity I can use.

kennymckormick commented 4 years ago

You may try the following steps to use BMN on new data (the 200 ActivityNet classes will be localized and classified):

  1. Use ActivityNet pre-trained models for feature extraction of the new untrimmed video.
  2. Post-processing extracted feature, generate the 100 (temporal dim) x 400 (RGB: 200-d; Flow: 200-d) descriptor for the untrimmed video.
  3. Inference with the pre-trained BMN models and the 100 x 400 descriptor, to get the action proposals.
  4. Use action proposals to extract clips from the untrimmed video, inference them with action recognition models pre-trained on ActivityNet.

Please refer to the ActivityNet data processing docs for more details.

makecent commented 4 years ago

You may try the following steps to use BMN on new data (the 200 ActivityNet classes will be localized and classified):

  1. Use ActivityNet pre-trained models for feature extraction of the new untrimmed video.
  2. Post-processing extracted feature, generate the 100 (temporal dim) x 400 (RGB: 200-d; Flow: 200-d) descriptor for the untrimmed video.
  3. Inference with the pre-trained BMN models and the 100 x 400 descriptor, to get the action proposals.
  4. Use action proposals to extract clips from the untrimmed video, inference them with action recognition models pre-trained on ActivityNet.

Please refer to the ActivityNet data processing docs for more details.

I have a similar question. I want to apply BMN on a new dataset in which actions do not belong to ActivityNet 200 classes. Do I need to fine-tune the pre-trained feature extraction model on my small dataset? My dataset only has 3 action classes therefore I am worried about if fine-tuning the pre-trained model for classification I will destroy the things the model had learned.

kennymckormick commented 4 years ago

You may try the following steps to use BMN on new data (the 200 ActivityNet classes will be localized and classified):

  1. Use ActivityNet pre-trained models for feature extraction of the new untrimmed video.
  2. Post-processing extracted feature, generate the 100 (temporal dim) x 400 (RGB: 200-d; Flow: 200-d) descriptor for the untrimmed video.
  3. Inference with the pre-trained BMN models and the 100 x 400 descriptor, to get the action proposals.
  4. Use action proposals to extract clips from the untrimmed video, inference them with action recognition models pre-trained on ActivityNet.

Please refer to the ActivityNet data processing docs for more details.

I have a similar question. I want to apply BMN on a new dataset in which actions do not belong to ActivityNet 200 classes. Do I need to fine-tune the pre-trained feature extraction model on my small dataset? My dataset only has 3 action classes therefore I am worried about if fine-tuning the pre-trained model for classification I will destroy the things the model had learned.

It depends on the size of your new dataset. If it is not large enough, maybe you can merge your dataset with activitynet, form a 203 class datasets to prevent overfitting

makecent commented 4 years ago

You may try the following steps to use BMN on new data (the 200 ActivityNet classes will be localized and classified):

  1. Use ActivityNet pre-trained models for feature extraction of the new untrimmed video.
  2. Post-processing extracted feature, generate the 100 (temporal dim) x 400 (RGB: 200-d; Flow: 200-d) descriptor for the untrimmed video.
  3. Inference with the pre-trained BMN models and the 100 x 400 descriptor, to get the action proposals.
  4. Use action proposals to extract clips from the untrimmed video, inference them with action recognition models pre-trained on ActivityNet.

Please refer to the ActivityNet data processing docs for more details.

I have a similar question. I want to apply BMN on a new dataset in which actions do not belong to ActivityNet 200 classes. Do I need to fine-tune the pre-trained feature extraction model on my small dataset? My dataset only has 3 action classes therefore I am worried about if fine-tuning the pre-trained model for classification I will destroy the things the model had learned.

It depends on the size of your new dataset. If it is not large enough, maybe you can merge your dataset with activitynet, form a 203 class datasets to prevent overfitting

Thanks!

borijang commented 2 years ago

Hi everyone, thanks for the valuable discussion. Can you clarify that if I want to finetune on a custom dataset, the BMN model does not need to be finetuned, but rather just the feature extraction model? And if that's the case, which architecture options do I have for the feature extraction model, apart from TSN?

makecent commented 2 years ago

@borijang I would say fine-tuning the BMN and freezing the feature extraction model seems to be more reasonable. That's also what the implementation of BMN on theTHUMOS14 did. Actually, lots of methods similar to the BMN use the same pretrained feature extraction model.