Open SadeghRahmaniB opened 8 months ago
Hi there,
Thanks for sharing this repository. As you may know, I have been using this repository and one of my aims is only to reuse the fusion part of this code and apply it to another repository. Let's say that I have model A which is doing video classification based on RGB frames. Then I want to add another modality to model A, and I need a fusion method that works well (at this stage, decoupling and recoupling are not important to my work). So, I choose this repo.
Now, my questions are 1_is there any way to just get the fusion part or it is too complicated to do so? 2_If it's applicable, how can I do it, or where should I start looking?
Bests,
Greetings, I comprehend your point. If your intention is solely to employ a multi-modal fusion network, you might consider exploring the approach presented in this study. It has demonstrated efficacy, particularly in scenarios where there is notable semantic similarity between two multi-modal features. Here's an example scenario: Suppose you've already trained an RGB and depth model. Now, you'll need to specify the parameter paths for these two models in the 'my_dataset.yaml' file.
fusion:
#-------Used for fusion network----------
rgb_checkpoint:
cs16: ''
cs32: '/homedata/bjzhou/codes/MotionRGBD-PAMI/output_dir/NV-TSM-M/model_best.pth.tar'
cs64: ''
depth_checkpoint:
cs16: ''
cs32: '/homedata/bjzhou/codes/MotionRGBD-PAMI/output_dir/NV-TSM-K/model_best.pth.tar'
cs64: ''
Following that, proceed to execute the subsequent command directly to initiate fusion training:
# scc-depth: number of CFCer used in spatial domain. tcc-depth: number of CFCer used in temporal domain.
python -m torch.distributed.launch --nproc_per_node=2 --master_port=1234 --use_env train_fusion.py --config config/my_dataset.yaml --data ./my_dataset/ --splits ./my_dataset/dataset_splits/ --num-classes 25 --save ./output_dir/fusion --batch-size 16 --sample-duration 32 \
--smprob 0.2 --mixup 0.8 --shufflemix 0.3 --epochs 100 --distill 0.0 --intar-fatcer 2 \
--FusionNet cs32 --lr 0.01 --sched step --opt sgd --decay-epochs 10 --scc-depth 2 --tcc-depth 4 --type rgbd
Please be aware that I have improved the corresponding code files below:
README.md config/my_dataset.yaml lib/datasets/base.py lib/datasets/build.py
Hope it can help you!
Hi there,
Thanks for sharing this repository. As you may know, I have been using this repository and one of my aims is only to reuse the fusion part of this code and apply it to another repository. Let's say that I have model A which is doing video classification based on RGB frames. Then I want to add another modality to model A, and I need a fusion method that works well (at this stage, decoupling and recoupling are not important to my work). So, I choose this repo.
Now, my questions are 1_is there any way to just get the fusion part or it is too complicated to do so? 2_If it's applicable, how can I do it, or where should I start looking?
Bests,