openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
25.6k stars 3.28k forks source link

RuntimeError: Input is too long for context length 77 #212

Open ancordovag opened 2 years ago

ancordovag commented 2 years ago

This happens when trying to tokenize ( clip.tokenize(train_sentences).to(device) ) sentences that have less than 77 tokens (for example 44), but some of them are unknown.

I have tried to operate the default argument context_length of the tokenize function (for example context_length =100), but then the encode function ( clip_model.encode_text ) would complain.

I have tried replacing all the atypical expressions and special characters, but still is an issue. Even with some simple sentence of 44 tokens like: "But the Commission cannot accept Amendments Nos 1 4 9 11 13 15 19 23 39 40 41 49 51 59 61 64 66 70 71 72 74 77 83 87 90 91 92 95 97 98 81 rev 100 101 103 and 107".

I'm getting the data of the European Parliament which sure was not created to describe images, but still. I wanted to know if there is a possible solution, even like skipping the impossible sentences.

Thanks.

dazcona commented 2 years ago

I am experiencing the same issue. There is no way to provide a keyword argument to clip.tokenize like truncate here (https://github.com/openai/CLIP/blob/40f5484c1c74edd83cb9cf687c6ab92b28d8b656/clip/clip.py#L195) so it does not raise an exception

hannahlyon commented 1 year ago

Not sure if this was updated, but you can pass truncate as a kwarg:

self.x = clip.tokenize(text, context_length=77, truncate=True)

thejiangcj commented 1 year ago

so why design for " def tokenize(texts: Union[str, List[str]], context_length: int = 77, truncate: bool = False) -> torch.LongTensor: ", what should I change context_length?