zifuwan / Sigma

[WACV 2025] Python implementation of Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation
https://zifuwan.github.io/Sigma/
MIT License
190 stars 19 forks source link

Two Questions About CroMB #21

Closed ZHUXIUJINChris closed 4 months ago

ZHUXIUJINChris commented 5 months ago

Thank you for the excellent work!

  1. Why did you swap the C matrix? Have you tried swapping the B matrix?
  2. Other Mamba fusion models usually adopt a gating mechanism. Compared to them, what are the advantages of this model?
zifuwan commented 5 months ago

Hi, thanks for your interest.

  1. Please refer to this response for the explanation.
  2. It would be great if you could elaborate on the gating mechanism you refer to. I'm not sure if the scale operation in ConMB is similar to the gating mechanism.
ZHUXIUJINChris commented 5 months ago

Thank you for your response.

  1. I noticed that the results for swapping the C matrix are the best. Do you know the specific reason for this?
  2. In models such as "Pan-Mamba," "Fusion mamba," "MambaDFuse," and "SurvMamba," an activation function is typically used to control the two pathways, achieving the effect of transmitting useful information and filtering out useless information.

Pan-Mamba: https://arxiv.org/pdf/2402.12192 1719976553905 Fusion Mamba: https://arxiv.org/pdf/2404.09146 def15f23c1c1d2bc73be44f763950f7 MambaDfuse: https://arxiv.org/pdf/2404.08406 1719976263981 SurvMamba: https://arxiv.org/pdf/2404.08027 1719976600540

zifuwan commented 4 months ago

Sorry for the late response.

  1. We suppose Matrix C is used to decode the information from the hidden state and can guide the reconstruction of the complementary modality. However, it is always recommended to experiment for the optimal parameter selection.
  2. Activation function can indeed be useful. However, in our method, we focus on a simple implementation to explore the potential of Mamba in multimodal learning. Hope this helps.