Open Tortoise17 opened 2 years ago
Just watched a explainer video from youtuber The AI Epiphany
near the beginning they do a 'hello world!' Experimenting, I found the words each count for 1 token as does the punctuation. The start and end of the string also count for 1 token each.
I recommend to open the colab and try out different strings to get a sense how the tokens are counted.
I have to ask important question. There is context length 77. Does it mean that the query search limit is 77 alphabets or 77 words or anything which I misunderstood? Please if you can guide it clearly.
Because as it has been trained on captions. I want to know its optimal use boundaries how the text is taken into consideration and what maximum size is the ideal for searching the embeddings.