penghao-wu / vstar

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
https://vstar-seal.github.io/
MIT License
497 stars 32 forks source link

How to finetune it with myself dataset #9

Open cana-jianbin opened 7 months ago

cana-jianbin commented 7 months ago

As the title show, Some data with many yolo style rectangle labels.

Is there some important info I have ignored?

best wishes.

penghao-wu commented 7 months ago

Hi, do you want to train the visual search model with your own data? That part of data does contain detection and segmentation labels. You can try to convert your own data to COCO-format and pre-process it with preprocess_data.py under the VisualSearch folder and modify the dataset accordingly.