Closed AntiLibrary5 closed 3 years ago
Hi, thanks for your interest.
Currently, this work and the other following works deal with homogeneous data where multimodal features are easily aligned. It still remains an open problem of how channel-exchanging can be applied to unaligned data such as different scenes. Perhaps adding a lightweight module to match these scenes can help. I will further think about the design for other tasks.
I see. Thank you for your response. Fusion of images from multi-view cameras (different angles) through channel exchanging could be an interesting direction for task like location retrieval. I will think about it too. Thank you again.
Hi, Thank you for your great work. I had a specific query the answer to which was not mentioned in the paper:
Given homogeneous data (images), but where different modalities (different image streams) do not correspond to a different version of the same view, for example, in pose regression methods like MapNet, the multimodal input would be 2 completely different images. So not 2 images (different views) of the same scene (like RGB+D images) but 2 different images of 2 different scenes (1 image taken at time t and the other taken at time t+1).
Do you feel that there could still be a gain with channel exchanging?
Thank you for your answer.