zhoubenjia / MotionRGBD-PAMI

MIT License
20 stars 2 forks source link

Reusing the fusion part of code #4

Open SadeghRahmaniB opened 8 months ago

SadeghRahmaniB commented 8 months ago

Hi there,

Thanks for sharing this repository. As you may know, I have been using this repository and one of my aims is only to reuse the fusion part of this code and apply it to another repository. Let's say that I have model A which is doing video classification based on RGB frames. Then I want to add another modality to model A, and I need a fusion method that works well (at this stage, decoupling and recoupling are not important to my work). So, I choose this repo.

Now, my questions are 1_is there any way to just get the fusion part or it is too complicated to do so? 2_If it's applicable, how can I do it, or where should I start looking?

Bests,

zhoubenjia commented 8 months ago

Hi there,

Thanks for sharing this repository. As you may know, I have been using this repository and one of my aims is only to reuse the fusion part of this code and apply it to another repository. Let's say that I have model A which is doing video classification based on RGB frames. Then I want to add another modality to model A, and I need a fusion method that works well (at this stage, decoupling and recoupling are not important to my work). So, I choose this repo.

Now, my questions are 1_is there any way to just get the fusion part or it is too complicated to do so? 2_If it's applicable, how can I do it, or where should I start looking?

Bests,

Greetings, I comprehend your point. If your intention is solely to employ a multi-modal fusion network, you might consider exploring the approach presented in this study. It has demonstrated efficacy, particularly in scenarios where there is notable semantic similarity between two multi-modal features. Here's an example scenario: Suppose you've already trained an RGB and depth model. Now, you'll need to specify the parameter paths for these two models in the 'my_dataset.yaml' file.

fusion:
  #-------Used for fusion network----------
  rgb_checkpoint:  
    cs16: ''
    cs32: '/homedata/bjzhou/codes/MotionRGBD-PAMI/output_dir/NV-TSM-M/model_best.pth.tar'
    cs64: ''
  depth_checkpoint:
    cs16: ''
    cs32: '/homedata/bjzhou/codes/MotionRGBD-PAMI/output_dir/NV-TSM-K/model_best.pth.tar'
    cs64: ''

Following that, proceed to execute the subsequent command directly to initiate fusion training:

# scc-depth: number of CFCer used in spatial domain. tcc-depth: number of CFCer used in temporal domain.
python -m torch.distributed.launch --nproc_per_node=2 --master_port=1234 --use_env train_fusion.py --config config/my_dataset.yaml --data ./my_dataset/ --splits ./my_dataset/dataset_splits/ --num-classes 25 --save ./output_dir/fusion --batch-size 16 --sample-duration 32 \
--smprob 0.2 --mixup 0.8 --shufflemix 0.3 --epochs 100 --distill 0.0 --intar-fatcer 2 \
--FusionNet cs32 --lr 0.01 --sched step --opt sgd --decay-epochs 10 --scc-depth 2 --tcc-depth 4 --type rgbd 

Please be aware that I have improved the corresponding code files below: README.md config/my_dataset.yaml lib/datasets/base.py lib/datasets/build.py

Hope it can help you!