opensearch-project / OpenSearch

πŸ”Ž Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
8.89k stars 1.63k forks source link

[BUG] Snapshot forward compatibility for patch version updates #13676

Open tan3-netapp opened 2 weeks ago

tan3-netapp commented 2 weeks ago

Describe the bug

I have an OpenSearch cluster running version 1.3.11, preparing to upgrade to version 1.3.15. The below steps are what I did:

  1. Snapshot of all indices in the cluster with OpenSearch 1.3.11;
  2. Upgraded the whole cluster from OpenSearch 1.3.11 to 1.3.15;
  3. Snapshot with OpenSearch 1.3.15 to the same snapshot repository as step 1;
  4. Restored snapshot taken in step 3 to an OpenSearch 1.3.11 cluster – Failed;
  5. Restored snapshot taken in step 1 to an OpenSearch 1.3.11 cluster – Succeeded;

Although semantic versioning should ensure some sort of compatibility, I cannot restore to a new cluster in step 4.
[Question 1] Is this behavior expected? That OpenSearch does not guarantee snapshot backward compatibility between minor versions?

I cannot directly restore from newer version to older version like in step 4 but I can still manage to restore the last old-version snapshot like in step 5 in case of failed upgrade. However, I have a concern:
[Question 2] Are older versions of OpenSearch guaranteed to be able to access a repository that has been modified by a newer version? Do I have to keep testing this behavior in future releases, and do we need another backup plan if it does not work?
I suppose, on a cluster with a single backup repository, Step 5 is the only way we can roll back to an older version in the event of failed upgrade (including unexpected breaking change). If what I’m concerned about in Question 2 is not guaranteed, I plan to use a separate snapshot repository for each minor version, which would be a lot to manage.

Although the strategy like in step 5 works, all changes after the upgrade (at step 2) such as creating, updating and deleting indices will surely be lost.
[Question 3] Is there any way to roll back to the older version that includes writes performed in the new version? Does not have to be an in-place rollback, restoring to a new cluster is fine as well.

If the behavior in step 5 always works, [Question 4] is it worth documenting in the OpenSearch official documentation?

Related component

Storage:Snapshots

To Reproduce

  1. Snapshot of all indices in the cluster with OpenSearch 1.3.11;
  2. Upgraded the whole cluster from OpenSearch 1.3.11 to 1.3.15;
  3. Snapshot with OpenSearch 1.3.15 to the same snapshot repository as step 1;
  4. Restored snapshot taken in step 3 to an OpenSearch 1.3.11 cluster – Failed;
  5. Restored snapshot taken in step 1 to an OpenSearch 1.3.11 cluster – Succeeded;

Expected behavior

As described in the Describe the bug section, I am not sure what I'm concerned about are bugs or not but the following summarized points are my expectations if they are:

  1. In Question 1, according to the semantic versioning definition, I expect that we can directly restore the new version snapshot to an older-version cluster.
  2. In Question 2, it's not a bug now, but I expect to have a guarantee that older versions of OpenSearch always access a snapshot repository modified by a newer version. This helps me not come up with a new backup plan, which requires much effort to manage.
  3. In Question 3, I expect to have a way to restore the changes made after the upgrade to an older-version cluster. If not, I need to have downtime to avoid any write operations during the upgrade.
  4. In Question 4, although I'm not sure what the answers to the above questions are, I expect we have an official documentation for them.

Additional Details

Host/Environment (please complete the following information):

Additional context Add any other context about the problem here.

peternied commented 2 weeks ago

[Triage - attendees 1 2 3 4 5 6 7 8] @tan3-netapp Thanks for creating this issue, this looks like an important and complex issue.

Bukhtawar commented 2 weeks ago

That OpenSearch does not guarantee snapshot backward compatibility between minor versions?

Lucene doesn't support segments written in higher version to be read by lower version, the reverse is however true i.e. higher versions supports reads of older segments in the minor version.

[Question 2] Are older versions of OpenSearch guaranteed to be able to access a repository that has been modified by a newer version? Do I have to keep testing this behavior in future releases, and do we need another backup plan if it does not work?

Yes this is guaranteed to work. We can doubly confirm on an integ test that verifies that behaviour

[Question 3] Is there any way to roll back to the older version that includes writes performed in the new version? Does not have to be an in-place rollback, restoring to a new cluster is fine as well.

No not supported, Please refer to the first answer

[Question 4] is it worth documenting in the OpenSearch official documentation?

Snapshot compatibility is well documented

tan3-netapp commented 2 weeks ago

Thank you so much for your quick and detailed reply, @Bukhtawar . I still have some minor follow-up questions:

Lucene doesn't support segments written in higher version to be read by lower version, the reverse is however true i.e. higher versions supports reads of older segments in the minor version.

Could you please give me a reference or a doc from Lucene confirming this fact?

We can doubly confirm on an integ test that verifies that behaviour

Could you please show me this integration test? I'm curious to know how it tests this behavior.

No not supported [Question 3]

By confirming this, I think I need to have downtime to avoid any write operations during the upgrade.

tan3-netapp commented 4 days ago

Hi @peternied and @Bukhtawar, do we have any other updates on this issue? According to the document @Bukhtawar provided, in the conflicts and compatibility section, it reads

Snapshots are only forward-compatible by one major version. If you have an old snapshot, you can sometimes restore it into an intermediate cluster, reindex all indexes, take a new snapshot, and repeat until you arrive at your desired version, but you might find it easier to just manually index your data in the new cluster.

This is not really clear about what I mentioned about the minor version upgrade. I plan to create a doco update PR to make that compability a little clearer given what it currently says in the compatbility and conflicts section isn't really explicit about older snapshots and the repositories that contain them continue to be compatible with older versions of opensearch