opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.86k stars 1.83k forks source link

[BUG] repository-s3 get or cat fails with snapshot_missing_exception #6457

Open Jakob3xD opened 1 year ago

Jakob3xD commented 1 year ago

Describe the bug I have one snapshot s3 repository configured on two clusters. When deleting a snapshot on the first cluster the second cluster is sometimes no longer able to get any snapshot from its repository. Please note that I am not requesting a specific snapshot but the list of all existing snapshots:

{
  "error": {
    "root_cause": [
      {
        "type": "snapshot_missing_exception",
        "reason": "[reindex:<snapshot-name/id>] is missing"
      }
    ],
    "type": "snapshot_missing_exception",
    "reason": "[reindex:<snapshot-name/id>] is missing",
    "caused_by": {
      "type": "no_such_file_exception",
      "reason": "Blob object [reindex/snap-<snapshot-id>.dat] not found: null (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: xyz; S3 Extended Request ID: xyz; Proxy: null)"
    }
  },
  "status": 404
}

The snapshot ID is as expected not present in the S3.

To Reproduce Steps to reproduce the behavior: The issue occurs from time to time and is not persistent reproduceable.

  1. Setup two Clusters (Cluster-1,Cluster-2)
  2. Configure s3 repo on Cluster-1 with the name cluster-2
  3. Configure s3 repo on Cluster-2 with the name reindex
  4. Create a snapshot on Cluster-2 into the repo reindex and wait for it to be completed
  5. Delete the create snapshot via Cluster-1
  6. Get or Cat the snapshots in Cluster-2 repo reindex 6.1. GET _cat/snapshots/reindex?v&s=id 6.2. GET _snapshot/reindex/*

Expected behavior I would expect to see all existing snapshots without an exception being raised.

Plugins

opensearch-alerting                  2.5.0.0
opensearch-anomaly-detection         2.5.0.0
opensearch-asynchronous-search       2.5.0.0
opensearch-cross-cluster-replication 2.5.0.0
opensearch-geospatial                2.5.0.0
opensearch-index-management          2.5.0.0
opensearch-job-scheduler             2.5.0.0
opensearch-knn                       2.5.0.0
opensearch-ml                        2.5.0.0
opensearch-neural-search             2.5.0.0
opensearch-notifications             2.5.0.0
opensearch-notifications-core        2.5.0.0
opensearch-observability             2.5.0.0
opensearch-reports-scheduler         2.5.0.0
opensearch-security                  2.5.0.0
opensearch-security-analytics        2.5.0.0
opensearch-sql                       2.5.0.0
repository-s3                        2.5.0

Host/Environment (please complete the following information):

Additional context The index- file in the s3 correctly contains all present snapshots. Is this file somehow cached by the plugin inside opensearch?

Jakob3xD commented 1 year ago

In addition: After deleting the repository and adding it back, everything works fine again and deleting a snapshot via another cluster does not brake it. Therefore another theory by me: Could this be related to cluster upgrades from some previous version to 2.5.0?