wusize / CLIPSelf

[ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
https://arxiv.org/abs/2310.01403
Other
149 stars 8 forks source link

How to train on custom dataset with ground truth masks? #22

Open Irennnne opened 1 month ago

Irennnne commented 1 month ago

I have my own dataset which contains the ground truth semantic segmentation masks for each image. How can I prepare the embeddings and region proposals for fine-tuning? Can I directly use the gt bboxes as the region proposals? Thank you!