zengyan-97 / X2-VLM

All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)
BSD 3-Clause "New" or "Revised" License
135 stars 13 forks source link

Dataset preparation #9

Open enimsay721 opened 1 year ago

enimsay721 commented 1 year ago

Hello! First and foremost, I'd like to express my heartfelt gratitude for sharing this invaluable project and for your contribution to the development of this code.

Currently, we are immersed in research where we're keen on replicating the results of a published paper, focusing on a specific type of label. In the section regarding public datasets, we noticed the mention of the need to prepare a custom dataset for pretraining. We're eager to learn more about the guidelines used to create these datasets and the considerations we should take into account in order to replicate the results successfully.

Specifically, we have some questions regarding the use of captions for the images. We observed that in the code, the getcaption function captures a label randomly. We would like to understand the rationale behind this approach and how we should proceed if we intend to perform training with custom objects. Should we include only a single label (given that it's randomly chosen) or would it be necessary to include all necessary labels to later distribute the weights during training, as illustrated in the following example:

"the first time : bar and businessperson are seen on vacation https://i1.wp.com/i.dailymail.co.uk/i/pix/2017/09/27/22/44CC9C0300000578-4927168-image-a-114_1506548238241.jpg?resize=634%2C954 fashion,fedora,sunglasses,hat,headgear,walking,t-shirt,footwear,leg,street fashion,sun hat,tourism,muscle,vacation,fashion accessory /m/032tl,/m/02fq6,/m/017ftj,/m/02dl1y,/m/01443y,/m/083mg,/m/013s93,/m/09j5n,/m/035r7c,/m/0408t8,/m/02wbtzl,/m/07bxq,/m/04_fs,/m/02jwqh,/m/0463sg 0.8364580273628235,0.8327218890190125,0.8078386187553406,0.7886446714401245,0.7307360768318176,0.7143886685371399,0.7064474821090698,0.7008925676345825,0.6920332312583923,0.6744363307952881,0.6647526025772095,0.6617092490196228,0.6424193978309631,0.6369329690933228,0.6162541508674622"

We are excited to learn more and enhance our understanding in this thrilling field. Any guidance you can provide would be greatly appreciated.

Thank you once again for your contribution, and we eagerly look forward to the opportunity to learn from you.

Best regards, Yas