Confusion about the calculation of image_pred

Hi, thanks for your great work.

m_pred = tuple(pairing_images.T.long())
image_global_allpoints = image_global.permute(0, 2, 3, 1)[m_pred]
image_pred = image_pred[m_pred]

However, I have some confusion about the above calculation process (L213-L215 in lightning_trainer.py).

What is the specific meaning of m_pred?

Why the shape of image_global/image_pred and the shape of m_pred are not the same can be done this way? This part of the operation feels puzzling.

If you can provide an answer, thank you very much!

runnanchen / CLIP2Scene