Open mvirag2000 opened 2 months ago
@mvirag2000 What do you think about the linked PR? Re your idea/request: I only introduced min_chunk_size, because the max size of chunks can be adjusted by tuning breakpoint_threshould_amount to a reasonable value.
URL
https://python.langchain.com/v0.2/docs/how_to/semantic-chunker/
Checklist
Issue with current documentation:
It seems that units for threshold-type = "percentage" are out of a hundred, i.e., 85.0 not 0.85, and this is also unclear for the other threshold types, "gradient," and "interquartile."
Idea or request for content:
Also, Semantic Chunker really needs a min and max chunk size. I am getting chunks of a single word, and chunks that exceed the OpenAI limit. Thanks for all the great work on LangChain.