Hello Firstly Thanks for rt-detr, I was really curious about how the query selection and where query selection is done and I believe it is done in vfl loss. As per my intuition the query selection has to be done using the encoder outputs $\hat{X}$, but the vfl loss just uses decoder outputs aka "pred_boxes" and "pred_logits". Can you please tell me what am I missing from the code ?
Hello Firstly Thanks for rt-detr, I was really curious about how the query selection and where query selection is done and I believe it is done in vfl loss. As per my intuition the query selection has to be done using the encoder outputs $\hat{X}$, but the vfl loss just uses decoder outputs aka "pred_boxes" and "pred_logits". Can you please tell me what am I missing from the code ?