Hi, thank you for your great work. I have a question about the text preprocessing for CLIP. The maximum input length for CLIP is 77 tokens, but most of the texts in the dataset are longer than 77 tokens. How do you preprocess these texts before extracting features with CLIP?
Hi, thank you for your great work. I have a question about the text preprocessing for CLIP. The maximum input length for CLIP is 77 tokens, but most of the texts in the dataset are longer than 77 tokens. How do you preprocess these texts before extracting features with CLIP?