opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.01k stars 1.67k forks source link

[Snapshot Interop] Shallow copy snapshots failing for closed indices #13805

Open harishbhakuni opened 1 month ago

harishbhakuni commented 1 month ago

Describe the bug

We recently found out a issue where shallow copy snapshots are failing for closed indices. However full copy snapshots succeeds for those indices.

Snapshot shard failed
java.nio.file.NoSuchFileException: Metadata file is not present for given primary term 2 and generation 6
    at org.opensearch.index.store.RemoteSegmentStoreDirectory.getMetadataFileForCommit(RemoteSegmentStoreDirectory.java:527)
    at org.opensearch.index.store.RemoteSegmentStoreDirectory.acquireLock(RemoteSegmentStoreDirectory.java:480)
    at org.opensearch.index.shard.IndexShard.acquireLockOnCommitData(IndexShard.java:1655)
    at org.opensearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:631)
    at org.opensearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:393)
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractPrioritizedRunnable.doRun(ThreadContext.java:979)
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)

For shallow copy snapshots, we refer latest remote store data and acquire a lock on that data. since the indices are closed no new data is being written to remote store which should get triggered as part of snapshot flush. this is causing snapshots to fail.

Related component

Storage:Snapshots

To Reproduce

  1. Create a remote store enabled cluster.
  2. Create indices and close them.
  3. Register a snapshot repository and enable shallow copy snapshots or use system repository created during cluster creation.
  4. Trigger snapshot, it will fail.

Expected behavior

Snapshots should pass.

Additional Details

Plugins Please list all plugins currently enabled.

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

Additional context Add any other context about the problem here.

peternied commented 1 month ago

[Triage - attendees 1 2 3 4 5 6 @harishbhakuni Thanks for creating this issue, we would welcome a pull request to address this bug

sachinpkale commented 1 month ago

[Storage Triage - attendees 1 2 3 4 5 6 7 8 9 10 ]

Added release target 2.16