zengyan-97 / X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
BSD 3-Clause "New" or "Revised" License
449 stars 51 forks source link

Will data leakage happen for bounding box prediction? #31

Open 1049451037 opened 1 year ago

1049451037 commented 1 year ago

As the output of ViT also contains position information, if we directly feed embeddings of visual concept region into MLP to prediction bounding box, will model just learn to output trivial position transformation?