opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.8k stars 1.82k forks source link

[BUG] Bulk Indexing POST API returns 200 and response.IsValid = true, but 1 document was not indexed. #16397

Open kknd4eva opened 3 weeks ago

kknd4eva commented 3 weeks ago

Describe the bug

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

AWS OpenSearch Serverless .NET8 OpenSearch.Client 1.8.0 OpenSearch.NET.Auth.AwsSigV4 1.8.0

Describe the issue: We have an OpenSearch serverless collection which has 2 indexes: salesorders backorders

when we index, we receive a sales order document with 1 or more lines. The whole document goes to 'salesorders' and the individual lines are turned into documents and indexed to backorders. This works fine for 99.9999% of the millions of documents we have indexed. However we have 1 instance, where the sales order line got indexed, but the sales order document did not.

We combine our document indexing into a single bulk API call (POST). We log any exception even with an individually indexed document as part of the bulk call, by using Response.IsValid:

        if (response.IsValid is false)
        {
            throw response?.OriginalException ??
                throw new OpenSearchClientException($"OpenSearch - {nameof(_client.BulkAsync)} failed to process request");
        }

This is specified as the right property to use to catch anything failed from the bulk API request:

2024-10-21 15_06_10-Window

In addition we also log errors here, and we have seen this work with genuine errors:

Untitled

For this particular request, everything returned 200, no errors at all:

2024-10-19T01:13:25.974+11:00 START RequestId: d25c49fa-444d-567e-9f39-16762b7487ce Version: $LATEST
2024-10-19T01:13:25.974+11:00 {"order_reference":"1061207807","timestamp":"2024-10-18T14:13:25.9743234Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"Received 1 messages"}
2024-10-19T01:13:25.974+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:25.9744389Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"Processing event for message id: 3ba4b295-e319-44bc-b80a-013ae7618c74"}
2024-10-19T01:13:25.974+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:25.9744577Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"Received sales order reference 5003047545 with (1) line items"}
2024-10-19T01:13:25.974+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:25.9744726Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"Sending (1) sales order/s to be indexed with a total number of line items (1)"}
2024-10-19T01:13:26.011+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:26.0116322Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"HEAD https://kbpqoexy1rgxpdq6m1f6.ap-southeast-2.aoss.amazonaws.com/salesorders 200"}
2024-10-19T01:13:26.034+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:26.0343743Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"HEAD https://kbpqoexy1rgxpdq6m1f6.ap-southeast-2.aoss.amazonaws.com/backorders 200"}
2024-10-19T01:13:26.203+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:26.2037842Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"POST https://kbpqoexy1rgxpdq6m1f6.ap-southeast-2.aoss.amazonaws.com/_bulk 200"}
2024-10-19T01:13:26.203+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:26.2038542Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"Bulk api response: [2] items processed successfully. api took 150 ms."}
2024-10-19T01:13:26.204+11:00 END RequestId: d25c49fa-444d-567e-9f39-16762b7487ce

Reports 200 responses across the board and both items indexed, but when we search for the specific sales order even after hours, it does not exist:

GET /salesorders/_doc/5003047545

{
  "_index": "salesorders",
  "_id": "5003047545",
  "found": false
}

But checking for the backorder index, its document from this bulk API call exists fine.

These are our connection settings:

return new ConnectionSettings(new Uri(openSearchDomain), connection)
                       .DisableDirectStreaming() // This will enable logging of request and response bodies
                       .OnRequestCompleted(apiCallDetails =>
                       {
                           if (apiCallDetails != null && apiCallDetails.Success)
                           {
                               logger.LogInformation(apiCallDetails.GetApiInfo());
                           }
                       });

Related component

Indexing

To Reproduce

Send a bulk API request to index multiple documents against up to 2 indexes:

private async Task SendBulkAsync(BulkRequest bulkRequest)
    {
        var response = await _retryPolicy.ExecuteAsync
            (
                async () => await _client.BulkAsync(bulkRequest, CancellationToken.None)
            );

        response.LogApiResponseMessage(_logger);

        if (response.IsValid is false)
        {
            throw response?.OriginalException ??
                throw new OpenSearchClientException($"OpenSearch - {nameof(_client.BulkAsync)} failed to process request");
        }
    }

Expected behavior

Response with 200 and IsValid = true from bulk API response. All documented indexed.

Additional Details

Plugins Please list all plugins currently enabled.

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

Additional context Add any other context about the problem here.

dblock commented 4 days ago

[Catch All Triage - 1, 2, 3, 4]