Closed lhr1998 closed 5 months ago
🤖
Based on the information you've provided, it seems like the issue might be related to how the segment overlap length
or chunk_overlap
value is being passed to the create_segment_to_index_task
method.
In the create_segment_to_index_task
method, the segment is created and indexed based on the content and metadata of the document. However, it doesn't seem like the chunk_overlap
value is being used in this method. This could be why the segment overlap length specified during the creation of the knowledge base is not being effective.
The chunk_overlap
value is used in the FixedRecursiveCharacterTextSplitter
or EnhanceRecursiveCharacterTextSplitter
instance during the text extraction and transformation process in the indexing runner. This is where the document is split into smaller segments based on the specified chunk_size
and chunk_overlap
.
character_splitter = FixedRecursiveCharacterTextSplitter.from_encoder(
chunk_size=segmentation["max_tokens"],
chunk_overlap=segmentation.get('chunk_overlap', 0),
fixed_separator=separator,
separators=["\n\n", "。", ".", " ", ""],
embedding_model_instance=embedding_model_instance
)
In this code, chunk_size
is the maximum number of tokens in each segment, and chunk_overlap
is the number of tokens that consecutive segments should overlap. This overlap can help ensure that no important information is lost at the boundaries between segments.
Unfortunately, I couldn't find how the chunk_overlap
value is being passed to the create_segment_to_index_task
method in the repository. It's possible that this information is available elsewhere or I might have missed it.
As for the screenshot you've shared, I'm sorry but I'm unable to view images. Could you please provide more details about what the screenshot shows? This could help in understanding the issue better.
If you could provide more information about how you're setting the chunk_overlap
value and how it's being used in your code, it would be helpful in diagnosing the issue.
this issue has fixed in the latest code ,thanks for your feedback @lhr1998
Self Checks
Provide a description of the help you need
I added segment overlap length when creating the knowledge base, but it doesn't seem to have much effect. The segment preview, including the retrieved paragraphs during use, does not contain contextual content