scylladb / scylla-manager

The Scylla Manager
https://manager.docs.scylladb.com/stable/
Other
51 stars 33 forks source link

missing upgrade procedure from 5.4 to 6.0 #3779

Open tchaikov opened 6 months ago

tchaikov commented 6 months ago

hi folks,

it's more like a question instead of bug report. in scylla 6.0 we will be using raft-based topology, as consistent-topology-changes is planned GA for 6.0, while in v5.4 we are still with the legacy gossiper-based topology. but the process of upgrading from legacy topology to raft-based topology is not fully automated. the procedure is documented at https://github.com/scylladb/scylladb/blob/master/docs/dev/topology-over-raft.md#upgrade-from-legacy-topology-to-raft-based-topology . we even have an issue for tracking a failure on one of our upgrade tests. and it failed just because the test didn't follow the procedure. see https://github.com/scylladb/scylla-dtest/issues/4130

i searched the repo of scylla-manager in order to find if we are using the API of /storage_service/raft_topology/upgrade, but nothing showed up. so, i guess we are not using it at this moment.

the question is:

shall we automatic the procedure to enable this feature? if not, why not? if yes, have we implemented it already?

tchaikov commented 6 months ago

/cc @bhalevy @karol-kokoszka

tzach commented 6 months ago

You want to automate Scylla core Upgrade from Manager?

tchaikov commented 6 months ago

yes, do we have another option to automate this procedure?

karol-kokoszka commented 6 months ago

I understand /storage_service/raft_topology/upgrade must be called on every node, right ? Then, it would be the easiest to use manager and add some CLI to sctool that would work outside of the scylla-manager server, but would be handled by the same process that executed "sctool upgrade-to-raft`" CLI.

Manager keeps the connection to all agent's that will proxy these calls to Scylla server. Looks that the API call is async, and there is GET method to monitor the status, so it would be nice to have something that orchestrates it on all node IMHO.

Trigger the upgrade via POST /storage_service/raft_topology/upgrade HTTP route Monitor progress of the upgrade via GET /storage_service/raft_topology/upgrade or via observing the logs After all nodes report done via the GET endpoint, the upgrade has fully finished

bhalevy commented 6 months ago

I understand /storage_service/raft_topology/upgrade must be called on every node, right ?

No, it should be called only on one node AFAIU. @kbr-scylla please stand me corrected.

Then, it would be the easiest to use manager and add some CLI to sctool that would work outside of the scylla-manager server, but would be handled by the same process that executed "sctool upgrade-to-raft`" CLI.

Manager keeps the connection to all agent's that will proxy these calls to Scylla server. Looks that the API call is async, and there is GET method to monitor the status, so it would be nice to have something that orchestrates it on all node IMHO.

Trigger the upgrade via POST /storage_service/raft_topology/upgrade HTTP route Monitor progress of the upgrade via GET /storage_service/raft_topology/upgrade or via observing the logs After all nodes report done via the GET endpoint, the upgrade has fully finished

kbr-scylla commented 6 months ago

https://opensource.docs.scylladb.com/master/upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology.html#running-the-procedure

Starting the upgrade procedure is done by issuing an POST HTTP request to the /storage_service/raft_topology/upgrade endpoint, to any of the nodes in the cluster.

You issue it to one node.

Then you wait for upgrade to be completed on all nodes e.g. by curl -X GET "http://127.0.0.1:10000/storage_service/raft_topology/upgrade"

tzach commented 6 months ago

yes, do we have another option to automate this procedure?

So far, we have not used Manager to automate upgrades, rolling restarts, or other cluster-wide producers. We can start doing that, but IMHO, a specific upgrade procedure does not justify it. Since the Manager is optional, we must provide a reasonable procedure to upgrade open source without it.

tzach commented 5 months ago

TL;DR Scylla Manager is an Enterprise product, we should not use it for open source upgrades.

tchaikov commented 5 months ago

@tzach but will we run into the same issue when upgrading scylla enterprise ? as scylla enterprise versions basically mirrors that of OSS.