zengyan-97 / X2-VLM

All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)
BSD 3-Clause "New" or "Revised" License
123 stars 10 forks source link

Patch features to region feature #2

Closed kimihailv closed 1 year ago

kimihailv commented 1 year ago

Hello. Could you please explain how patch features from ViT are aggregated to one specific region feature? This point is confusing, because a region doesn't necessarily contain one or several whole patches.

zengyan-97 commented 1 year ago

Hi, sorry for my late reply. I use image_atts to indicate a region: https://github.com/zengyan-97/X-VLM/blob/master/models/model_pretrain.py#L14