opensearch-project / index-management

🗃 Automate periodic data operations, such as deleting indices at a certain age or performing a rollover at a certain size
https://opensearch.org/docs/latest/im-plugin/index/
Apache License 2.0
52 stars 107 forks source link

[BUG] Snapshot policies fail to create/delete snapshots #1077

Open adamdepollo opened 5 months ago

adamdepollo commented 5 months ago

What is the bug? Snapshot policies are failing to create or delete snapshots.

I am able to manually create snapshots writing to the same snapshot repos I've specified for the snapshot policies, however the snapshot policies themselves consistently fail to create or delete a snapshot. There is no error message printed to the indicate the cause of the issue and I am unable to locate related logs in any of the OpenSearch cluster logs.

I have 4 different policies with different schedules (hourly, daily, weekly, monthly) all of which are failing.

How can one reproduce the bug?

  1. Register S3 bucket as snapshot repo
  2. Create snapshot policy scheduling snapshots to write to the registered repo
  3. See error when the snapshot policy attempts to create a snapshot

What is the expected behavior? Snapshots should be created successfully and, if not, some kind of relevant error message should be printed to the output.

What is your host/environment?

Do you have any screenshots?

Screenshot 2024-01-23 at 6 04 24 PM Screenshot 2024-01-23 at 6 04 53 PM Screenshot 2024-01-23 at 6 05 10 PM

Do you have any additional context? No

dblock commented 2 weeks ago

Were you able to resolve this problem @adamdepollo?

Catch All Triage - 1 2 3 4 5

adamdepollo commented 1 week ago

@dblock No, I would say this is still a problem.

I was only able to figure out why these snapshots were failing after opening a ticket with AWS support. It turned out the problem was my snapshot policies were trying to run snapshots on indexes in cold storage, which cannot be snapshotted. So technically the snapshot policies were working as expected.

However, I would still say there is an issue here in that there are no useful logs printed to the UI to indicate the cause of the issue. The logs are also not output to any of audit/error logs that are available to customers running AWS-hosted OpenSearch. The only way to identify the problem was to open a support ticket and have support engineers identify the issue.

Perhaps this issue could be updated to more of a feature request for better/more obvious logging on issues with index snapshots.