jacklishufan / Mamba-ND

Ofiicial Implementation for Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
46 stars 4 forks source link

Clarification on Pretrain Dataset for Video Classification #2

Closed HoBeom closed 7 months ago

HoBeom commented 7 months ago

First and foremost, I'd like to extend my sincerest gratitude for sharing the results of your experiments through this repository.

As I delve deeper into your implementation, especially for the HMDB-51 model, I encountered a slight confusion regarding the pretrained dataset used, and I hope to seek your clarification to further my understanding and application of your work.

In the configuration file for the hmdb-51 model located here, it is indicated that a kinetics400 pretrained model is utilized:

https://github.com/jacklishufan/Mamba-ND/blob/350ce66db70fa1ffc4cc0c5303f1d74d154b6977/video_classification/config/hmdb-51.py#L22

However, the README document here mentions the use of an ImageNet pretrained model as the basis for training.

https://github.com/jacklishufan/Mamba-ND/blob/350ce66db70fa1ffc4cc0c5303f1d74d154b6977/video_classification/readme.MD?plain=1#L26

This difference in the mentioned pretrained models has led to some confusion on my part, and I kindly request your guidance on which pretrained model is recommended for use with the HMDB-51 video classification in this project. Is the preference towards the kinetics400 model as specified in the config file, or should we use the ImageNet pretrained model as mentioned in the README?

Your input on this matter would be invaluable for those of us looking to accurately replicate your results and perhaps even build upon them. Thank you once again for your dedication to this project and for any clarification you can provide.

HoBeom commented 7 months ago

After further viewing of the official documentation, I've come to understand that although a model is specified in the load_from attribute, it will not be loaded if the resume is set to False. This clarification has helped me better understand the model loading behavior within the framework.

With this newfound understanding, I'm curious about the performance of the model pretrained on Kinetics400 in your project. Specifically, I wonder if there were any performance issues or notable observations regarding using the Kinetics400 pretrained model in your experiments. Additionally, would it be possible for you to share the checkpoint for the Kinetics400 pretrained model?

jacklishufan commented 7 months ago

Hi We are sorry for the confusion.

  1. The swin-tiny-p244-w877_in1k-pre_8xb8-amp-32x2x1-30e_kinetics400-rgb-mamba-6.py is merely a naming issue because locally we initialize our training config on HMDB-51 by modifying swin-tiny-p244-w877_in1k-pre_8xb8-amp-32x2x1-30e_kinetics400 from mmaction.
  2. We never ran K400 experiments. The weight folder is automatically generated from the config name. So this /mnt/c/mmaction2/work_dirs/swin-tiny-p244-w877_in1k-pre_8xb8-amp-32x2x1-30e_kinetics400-rgb-mamba-6.7/epoch_51.pth is actually the HMDB-51 checkpoint. Similarly, /mnt/c/mmaction2/work_dirs/swin-tiny-p244-w877_in1k-pre_8xb8-amp-32x2x1-30e_kinetics400-rgb-mamba-6.7_ucf/best_acc_top1_epoch_48.pth in the UCF-101 config is nothing but the final UCF-101 checkpoint.
  3. We dump the eval script, so the load_from='xxx' merely means this checkpoint is used in the evaluation process. It will not be loaded for training if resume is False.

Overall, I do agree that the repo might be a bit confusing. We will clean up the configs.