Closed andrehuang closed 1 month ago
Hi Mark,
I want to ask what's the difference in the pretrained upsamplers for CLIP and MaskCLIP? Are they trained with different losses?
If they use the same FeatUp framework as the paper, why should there be a difference? Can I use the upsampler for CLIP also for vision-language tasks?
Thanks in advance.
Best, Haiwen
Hi Mark,
I want to ask what's the difference in the pretrained upsamplers for CLIP and MaskCLIP? Are they trained with different losses?
If they use the same FeatUp framework as the paper, why should there be a difference? Can I use the upsampler for CLIP also for vision-language tasks?
Thanks in advance.
Best, Haiwen