yikaiw / CEN

[TPAMI 2023, NeurIPS 2020] Code release for "Deep Multimodal Fusion by Channel Exchanging"
MIT License
284 stars 43 forks source link

Query regarding applicability to other tasks #10

Closed AntiLibrary5 closed 3 years ago

AntiLibrary5 commented 3 years ago

Hi, Thank you for your great work. I had a specific query the answer to which was not mentioned in the paper:

Given homogeneous data (images), but where different modalities (different image streams) do not correspond to a different version of the same view, for example, in pose regression methods like MapNet, the multimodal input would be 2 completely different images. So not 2 images (different views) of the same scene (like RGB+D images) but 2 different images of 2 different scenes (1 image taken at time t and the other taken at time t+1).

Do you feel that there could still be a gain with channel exchanging?

Thank you for your answer.

yikaiw commented 3 years ago

Hi, thanks for your interest.

Currently, this work and the other following works deal with homogeneous data where multimodal features are easily aligned. It still remains an open problem of how channel-exchanging can be applied to unaligned data such as different scenes. Perhaps adding a lightweight module to match these scenes can help. I will further think about the design for other tasks.

AntiLibrary5 commented 3 years ago

I see. Thank you for your response. Fusion of images from multi-view cameras (different angles) through channel exchanging could be an interesting direction for task like location retrieval. I will think about it too. Thank you again.