masashi-hatano / MM-CDFSL

[ECCV 2024] Official code release for "Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition"
https://masashi-hatano.github.io/MM-CDFSL/
14 stars 1 forks source link
action-recognition cross-domain-few-shot-learning egocentric-vision multi-modal-distillation
# Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition (ECCV'24) PyTorch Lightning Config: Hydra **[[Paper](https://masashi-hatano.github.io/assets/pdf/mm-cdfsl.pdf)][[Supplementary](https://masashi-hatano.github.io/assets/pdf/mm-cdfsl_supp.pdf)][[Project Page](https://masashi-hatano.github.io/MM-CDFSL/)][[Poster](https://masashi-hatano.github.io/assets/pdf/mm-cdfsl_poster.pdf)][[Data](https://huggingface.co/datasets/masashi-hatano/MM-CDFSL/tree/main)]**

This is the official code release for our ECCV 2024 paper \ "Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition".

🔨 Installation

# Create a virtual environment
python3 -m venv mm-cdfsl
source mm-cdfsl/bin/activate

# Install the dependencies
pip install -r requirements.txt

📂 Data Preparation

Training Split

You can find the train/val split files for all three target datasets in cdfsl folders.

Data Structure

Please follow the data structure as detailed in DATA_STRUCTURE.md.

Pre-processed Data

You can download the pre-processed data from the hub.

📍 Model Zoo

You can brouse the checkpoints of pre-trained model, comparison methods, and our models in this folder or directly download from the following links.

Pre-Train

| Method | Source Dataset | Target Dataset | Modality | Ckpt | | :------: | :------------: | :------------: | :------: | :--: | | VideoMAE | Kinetics-400 | - | RGB | [checkpoint](https://keio.box.com/shared/static/k71pgayzc4kkbe3n98tc4atakj8d04f0.pth) | | VideoMAE | Ego4D | - | RGB | [checkpoint](https://keio.box.com/shared/static/svebiau84n32kl9cl4s1lov3zjipb0v1.pt) | | VideoMAE w/ classifier | Ego4D | EPIC | RGB | [checkpoint](https://keio.box.com/shared/static/9pr0o9jtxv7i6azwpjss2vjhhfgvtaco.pt) | | VideoMAE w/ classifier | Ego4D | EPIC | flow | [checkpoint](https://keio.box.com/shared/static/e20y6fx4pva1i0mcvv11q3lgqhoixmal.pt) | | VideoMAE w/ classifier | Ego4D | EPIC | pose | [checkpoint](https://keio.box.com/shared/static/8i7k76vimvnxo6r6pwtpjiai83qy8cx3.pt) | | VideoMAE w/ classifier | Ego4D | MECCANO | RGB | [checkpoint](https://keio.box.com/shared/static/l44k9dmebz5ft4pos6kznfbshpbnn306.pt) | | VideoMAE w/ classifier | Ego4D | MECCANO | flow | [checkpoint](https://keio.box.com/shared/static/c07ugtpkjcg5010dc9c12qjt7ylialic.pt) | | VideoMAE w/ classifier | Ego4D | MECCANO | pose | [checkpoint](https://keio.box.com/shared/static/54dno3qfm6brke5iidls94cmtk1asdb9.pt) | | VideoMAE w/ classifier | Ego4D | WEAR | RGB | [checkpoint](https://keio.box.com/shared/static/q2ckke6wmyufgay0o4t95z7bkovg86bz.pt) | | VideoMAE w/ classifier | Ego4D | WEAR | flow | [checkpoint](https://keio.box.com/shared/static/vzq5spkm0xeldkov73p8gbhj26yyctjj.pt) | | VideoMAE w/ classifier | Ego4D | WEAR | pose | [checkpoint](https://keio.box.com/shared/static/8w85fkc1nlwbgapuuxupzov3cd0v14dx.pt) |

2nd Stage

| Method | Source Dataset | Target Dataset | Modality | Ckpt | | :------: | :------------: | :------------: | :------: | :--: | | STARTUP++ | Ego4D | EPIC | RGB | [checkpoint](https://keio.box.com/shared/static/henl7rx9dknc28yty1d8veaee4la6axg.pt) | | STARTUP++ | Ego4D | MECCANO | RGB | [checkpoint](https://keio.box.com/shared/static/8v0aesw4adj5vgecykqj77korq5kqsc6.pt) | | STARTUP++ | Ego4D | WEAR | RGB | [checkpoint](https://keio.box.com/shared/static/sz8veo961xaorqlc9yjcf7h59pzyl3pg.pt) | | Dynamic Distill++ | Ego4D | EPIC | RGB | [checkpoint](https://keio.box.com/shared/static/n9djko04ckzrk45j3m06cabtgm6wvm8p.pt) | | Dynamic Distill++ | Ego4D | MECCANO | RGB | [checkpoint](https://keio.box.com/shared/static/ay40mcw5cr1i4rihqu8gaav2k2k2lad2.pt) | | Dynamic Distill++ | Ego4D | WEAR | RGB | [checkpoint](https://keio.box.com/shared/static/nvv0otmgjqfbuui2ce0hx8h1uo44ssko.pt) | | CDFSL-V | Ego4D | EPIC | RGB | [checkpoint](https://keio.box.com/shared/static/nx8lt1aghlqfp6ay15st5e31g8x9k7l7.pt) | | CDFSL-V | Ego4D | MECCANO | RGB | [checkpoint](https://keio.box.com/shared/static/u6t2lg092wvbwgri9zdie314ye0ujpdt.pt) | | CDFSL-V | Ego4D | WEAR | RGB | [checkpoint](https://keio.box.com/shared/static/84c262upfhmagln06k4znscoggt2dag2.pt) | | Ours | Ego4D | EPIC | RGB, flow, pose | [checkpoint](https://keio.box.com/shared/static/u2mig7f0lsyqjztljszema38oi09ub6t.pt) | | Ours | Ego4D | MECCANO | RGB, flow, pose | [checkpoint](https://keio.box.com/shared/static/39cs8ug82i2ufoncwx6dayds6bfrrp18.pt) | | Ours | Ego4D | WEAR | RGB, flow, pose | [checkpoint](https://keio.box.com/shared/static/yxzzrj1j2mno0e06p4l0gfcbti45kbb3.pt) |

🔥 Training

1. Pre-training

Please make sure that you set modality (e.g., rgb) and dataset (e.g., epic) in configs/trainer/pretrain_trainer.yaml and confings/data_module/pretrain_data_module.yaml.

python3 lit_main_pretrain.py train=True test=False

2. Multimodal Distillation

Please make sure that you set dataset (e.g., epic) in confings/data_module/mm_distill_data_module.yaml. Also, you need to set the ckpt path of all modalities in configs/trainer/mm_distill_trainer.yaml.

python3 lit_main_mmdistill.py train=True test=False 

🔍 Evaluation

To evaluate the model in 5-way 5-shot setting with 600 runs, please run the following command.

python3 lit_main_mmdistill.py train=False test=True data_module.n_way=5 data_module.k_shot=5 data_module.episodes=600

✍️ Citation

If you use this code for your research, please cite our paper.

@inproceedings{Hatano2024MMCDFSL,
  author = {Hatano, Masashi and Hachiuma, Ryo and Fujii, Ryo and Saito, Hideo},
  title = {Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2024},
}