Open oasis-zhou opened 1 month ago
It isn't clear what the error is that you are trying to report. Can you show a code sample and a stack trace please?
currentBatch.clear(); Simply clearing the 'currentBatch' reference object does not create a new batch, so the contents of the previous batch will be cleared, resulting in 'batches' only containing the contents of the last batch.
The reason is that there is a problem with the batching logic in the TokenCountBatchingStrategy, which exists in version 1.0.0-M2:
public List<List> batch(List documents) {
List<List> batches = new ArrayList();
int currentSize = 0;
List currentBatch = new ArrayList();
int tokenCount; for(Iterator var5 = documents.iterator(); var5.hasNext(); currentSize += tokenCount) { Document document = (Document)var5.next(); tokenCount = this.tokenCountEstimator.estimate(document.getFormattedContent(this.contentFormater, this.metadataMode)); if (tokenCount > this.maxInputTokenCount) { throw new IllegalArgumentException("Tokens in a single document exceeds the maximum number of allowed input tokens"); }
if (currentSize + tokenCount > this.maxInputTokenCount) { batches.add(currentBatch); currentBatch.clear(); currentSize = 0; }
currentBatch.add(document); }
if (! currentBatch.isEmpty()) { batches.add(currentBatch); }
return batches; }
The following sentence is written incorrectly. Please refer to the writing style of version 1.0.0-SNAPSHOT: if (currentSize + tokenCount > this.maxInputTokenCount) { batches.add(currentBatch); currentBatch.clear(); currentSize = 0; }