Nice work! And I'd like to do some research based on your code. Can you provide more details about training? For example, the text encoder(tokenizer, max_token_length).
I've set the parameter as you specified in the Tab.8 of your paper. But I get really bad results. I think it maybe due to the text encoder setup, which is default text encoder of CLIP with vocabulary ~48000 and max length 77. However, this cannot suit the need of medical image. So I think it would be better for the community to reproduce your results if you can provide more details about the training process. Thank you very much!
Nice work! And I'd like to do some research based on your code. Can you provide more details about training? For example, the text encoder(tokenizer, max_token_length).
I've set the parameter as you specified in the Tab.8 of your paper. But I get really bad results. I think it maybe due to the text encoder setup, which is default text encoder of CLIP with vocabulary ~48000 and max length 77. However, this cannot suit the need of medical image. So I think it would be better for the community to reproduce your results if you can provide more details about the training process. Thank you very much!