Open youngtboy opened 2 weeks ago
This is a comprehensive experimental study, but I have a question: if we abandon task loss (CLIP Pretrain Loss) and only use distillation loss (such as FD), how does the performance compare to the provided results?
This is a comprehensive experimental study, but I have a question: if we abandon task loss (CLIP Pretrain Loss) and only use distillation loss (such as FD), how does the performance compare to the provided results?