wusize / CLIPSelf

[ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
https://arxiv.org/abs/2310.01403
Other
170 stars 9 forks source link

How to train on custom dataset with ground truth masks? #22

Open Irennnne opened 6 months ago

Irennnne commented 6 months ago

I have my own dataset which contains the ground truth semantic segmentation masks for each image. How can I prepare the embeddings and region proposals for fine-tuning? Can I directly use the gt bboxes as the region proposals? Thank you!