Based on discussion, it's strongly recommended to move sub batching logic from _bulk API to each processor.
So for two batch supporting processors: text_embedding, sparse_encoding, based on discussion, we make them inherit from a newly introduced AbstractBatchingProcessing so that these two processors supports a new optional parameter batch_size and this parameter can control the cutting sub batches logic. The default of this parameter is 1 to be consistent with existing behavior.
[x] Commits are signed as per the DCO using --signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Description
Based on discussion, it's strongly recommended to move sub batching logic from
_bulk
API to each processor.So for two batch supporting processors: text_embedding, sparse_encoding, based on discussion, we make them inherit from a newly introduced
AbstractBatchingProcessing
so that these two processors supports a new optional parameterbatch_size
and this parameter can control the cutting sub batches logic. The default of this parameter is 1 to be consistent with existing behavior.Add more integration tests.
Issues Resolved
https://github.com/opensearch-project/OpenSearch/issues/14283
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.