opensearch-project / opensearch-java

Java Client for OpenSearch
Apache License 2.0
119 stars 182 forks source link

[FEATURE] Reimplement BulkProcessor #181

Open markmccallion28 opened 2 years ago

markmccallion28 commented 2 years ago

Are there any plans on adding an equivalent of the BulkProcessor API, which was available in the High Level Rest Client, so that Index and Delete requests can be batched?

dblock commented 2 years ago

Would love it if someone (you?) could contribute this! I suppose that code can be just ported here? I don't see any obvious cons.

ginkel commented 1 year ago

FYI: We are currently looking into porting the BulkProcessor from the OpenSearch code base and noticed one missing feature in the Java-Client API :

While The RHLC allows to estimate the size of a Bulk Action (now called BulkOperation), this feature seems to be missing from this client als the JSON is no longer rendered when adding the payload to the bulk, but lazily when eventually dispatching the request.

Can you think of an efficient way to perform such a size estimation or would you rather drop the bulkSize configuration option from the BulkProcessor for now?

reta commented 1 year ago

Can you think of an efficient way to perform such a size estimation or would you rather drop the bulkSize configuration option from the BulkProcessor for now?

I think that would significantly degrade the BulkProcessor usefulness: fe Apache Flink uses bulkSize as (one of) flush triggers. I am pretty sure that many other projects rely on it as well.

ginkel commented 1 year ago

Elastic have just merged BulkIngester to their elasticsearch-java client, which covers the old BulkProcessor's features except for the retry handling. Licensed under the Apache License, Version 2.0.

We did a quick (preliminary) port to the opensearch-java client, which worked pretty smooth (tests and performance tests still pending).

Would such a port be something that you'd consider worth and acceptable contributing?

reta commented 1 year ago

@ginkel this is very tricky (taking into account numerous precedents with Elastic as a company). Yes the license seems to be ASFv2, we may ask the contributor if he is open to submit the BulkIngester pull request to OpenSearch as well, but I would be very caution with cherry-picking anything under Elastic organization. @dblock @nknize thoughts guys?

dblock commented 1 year ago

We will gladly accept code under the APLv2 license. The contributor needs to make sure that they have not looked at, or copied any non-APLv2 code while reimplementing a feature in OpenSearch. If the client is indeed APLv2 it's all good.

ginkel commented 1 year ago

All ported code carries the following license header:

/*
 * Licensed to Elasticsearch B.V. under one or more contributor
 * license agreements. See the NOTICE file distributed with
 * this work for additional information regarding copyright
 * ownership. Elasticsearch B.V. licenses this file to you under
 * the Apache License, Version 2.0 (the "License"); you may
 * not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */
dblock commented 1 year ago

@ginkel Yes. Keep those headers please and add the OpenSearch ones.

tballison commented 1 year ago

Any updates on this? This is a blocker on https://issues.apache.org/jira/browse/NUTCH-2994. Let me know if I can help.

wbeckler commented 1 year ago

Feel free to continue with the above strategy or any other solution that works.

On Thu, Jun 8, 2023 at 2:37 PM Tim Allison @.***> wrote:

Any updates on this? This is a blocker on https://issues.apache.org/jira/browse/NUTCH-2994. Let me know if I can help.

— Reply to this email directly, view it on GitHub https://github.com/opensearch-project/opensearch-java/issues/181#issuecomment-1583148199, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5PRLWT7LFG7SOG6EVG7T3XKILV3ANCNFSM54DZFVDA . You are receiving this because you are subscribed to this thread.Message ID: @.***>