Open wuyefa opened 3 months ago
Sorry! It's an amazing work! I thought the visual parameters of the clip used were fine-tuned.
Hello! This is really a good piece of work. Will the related training and fine-tuning code be open sourced in the future? The data of our usage scenario is relatively special, and we want to try the effect of fine-tuning. Thank you so much.
As a zero-shot method, our MuSc uses a pre-trained image encoder of CLIP as the feature extractor, requiring no additional training and fine-tuning. Our code will automatically download the pre-trained model, or you can download the pre-trained ViT of CLIP and DINO via the official projects below. CLIP: https://github.com/mlfoundations/open_clip DINO: https://github.com/facebookresearch/dino DINO_v2: https://github.com/facebookresearch/dinov2
This is really amazing. I initially thought it would require at least fine-tuning on industrial datasets. Thank you for your explanation.
Hello! This is really a good piece of work. Will the related training and fine-tuning code be open sourced in the future? The data of our usage scenario is relatively special, and we want to try the effect of fine-tuning. Thank you so much.