Closed Haerxu closed 4 months ago
To build a comprehensive network, it is essential to supplement the oracle network with information from the 3D network. Therefore, we integrate the 3D network’s information into the oracle network as values. If the 3D network acts as the query, then the oracle is essentially the 3D network. Our goal is to avoid adding extra information to the 3D network.
Hi,
Thanks for releasing code! I have a question about section "Multi-modal Prediction Model as the Oracle". It says "To be specific, these collaboration operations are implemented by multi-head cross-attention (MHCA) modules [33], where the keys and values are node/edge features from the 3D model, and the queries are their counterparts from the multi-modal model."
Why 3D model is the key and value? I thought it should be query since the 3D model is weaker than the oracle model.