openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
25.44k stars 3.27k forks source link

Context length understanding #185

Open Tortoise17 opened 2 years ago

Tortoise17 commented 2 years ago

I have to ask important question. There is context length 77. Does it mean that the query search limit is 77 alphabets or 77 words or anything which I misunderstood? Please if you can guide it clearly.

Because as it has been trained on captions. I want to know its optimal use boundaries how the text is taken into consideration and what maximum size is the ideal for searching the embeddings.

aztecman commented 2 years ago

Just watched a explainer video from youtuber The AI Epiphany

near the beginning they do a 'hello world!' Experimenting, I found the words each count for 1 token as does the punctuation. The start and end of the string also count for 1 token each.

I recommend to open the colab and try out different strings to get a sense how the tokens are counted.