Closed 94bb494nd41f closed 1 year ago
As long as the context size of model is higher than the tokens in chunk, it should be fine.
Since tokenization is dependent on model, RecursiveCharacterTextSplitter
or other preprocessors will not know about the tokens.
AFAIK the default of the length measure
RecursiveCharacterTextSplitter
islen
while it is some token measure for the instrutor embeddings.The programm still works, however chunks inserted into the database a smaller than one would suspect.