Closed wwiwush closed 1 year ago
Hi, thanks for your interest! This to alleviate overfitting when training COCO. We use the output of the last (12th) attention layer for classification. Details are here. You can also refer to the last paragraph of S1 section in the supplementary material.
Thanks for this nice work! I'd like to know how was coco_clip_hand_craft_attn12.npy generated and the difference between coco_clip_hand_craft_attn12.npy and coco_clip_hand_craft.npy. Thank you!