yikaiw / TokenFusion

[CVPR 2022] Code release for "Multimodal Token Fusion for Vision Transformers"
https://arxiv.org/pdf/2204.08721.pdf
MIT License
170 stars 16 forks source link

l1_lamda value for SUN RGBD #7

Open harshm121 opened 2 years ago

harshm121 commented 2 years ago

Hi, I am trying to reproduce the results for SUN RGBD. The paper mentions 1e-3 whereas the ReadMe mentions 1e-6. Thanks!

yikaiw commented 2 years ago

Hi, do you mean SUN RGBD for semantic segmentation or SUN RGBD for 3D object detection?

harshm121 commented 2 years ago

Hi, Sorry for not being clear. I mean SUN RGBD for the segmentation task.

yikaiw commented 2 years ago

1e-6 empirically works better. In fact, choosing this hyper-parameter is not strict, as long as we keep the final exchanged ratios around 30%~50%.

harshm121 commented 2 years ago

Thanks. I did try reproducing the results with 1e-6 but was not able to get even close to the reported 51.4 mIoU on Segformer B2. Can you share the config.py file for the SUN RGBD (and also any other specific details I should note when training on SUN RGBD)?