Closed soham-joshi closed 1 year ago
I had a query, what does end_points['last_sem_cls_scores'] (dim=(B, 256, 256) ) in the same function represent?
We contrast each token in the sentence with each query. The predictions for one piece of text are the predictions of the queries, with confidence equal to the similarity of the projected queries and respective tokens.
For object detection prompts, we collect class scores and predictions by looping over the classes (word_idx) and aggregating the scores of the corresponding tokens (token_idx).
Then we use these to fill the semantic scores for the last prediction head, which are fed to the evaluator.
Okay Moreover, what do the second and third dimensions represent in end_points['last_sem_cls_scores']? Do they represent logits over a set of (256?) classes?
the 2nd dimension represents the number of queries, and the third dimension is logits over the 256 tokens in a sentence span. Each query predicts a distribution over the sentence span.
Thanks for the responses @ayushjain1144 @nickgkan !
Hi @ayushjain1144 Could you clarify what the numpy arrays word_idx and token_idx in train_dist_mod.py are created for? reference: line 205
Thanks!