When creating an embedding for the attached HTML page we receive the error, "ContextWindowExceededError". Complete stack trace with litellm debug info below.
Set TOKEN_LIMIT to 5000, lower than what is the default 8192 tokens for text-embedding-3-small
Possible Causes and Discovery
chunk_overlapseems to be hardcoded and doesn't seem to be something the user can override by specifying it in their settings. Further, shouldn't this be calculated based off 10-20% of the chunk_size.
Packages and Versions
python 3.11.1 django==4.2.14 wagtail==5.2.6 wagtail-vector-index==0.10.0 openai==1.47.1 litellm==1.40.15 pgvector==0.3.5
Configuration
Issue
When creating an embedding for the attached HTML page we receive the error, "ContextWindowExceededError". Complete stack trace with litellm debug info below.
Workaround that Worked
Set
TOKEN_LIMIT
to 5000, lower than what is the default 8192 tokens fortext-embedding-3-small
Possible Causes and Discovery
chunk_overlap
seems to be hardcoded and doesn't seem to be something the user can override by specifying it in their settings. Further, shouldn't this be calculated based off 10-20% of the chunk_size.I would like to believe the NaiveTextSplitterCalculator has something to do with the error but haven't had a chance to investigate further.
embedding-fail.html.txt