Closed sbernauer closed 2 months ago
So, looking at https://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html and testing locally, it looks like the sequence is roughly:
hdfs dfsadmin -rollingUpgrade prepare
, this uses Hadoop RPC and can be executed from anywherehdfs dfsadmin -rollingUpgrade query
but this is not suitable for machine consumption (unparseable output, status code is constant), can be queried over JMX and REST as curl $NAMENODE_HTTP_URL/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo | jq '.beans[0].RollingUpgradeStatus.createdRollbackImages'
, but we could also (maybe) use the internal Java API-rollingUpgrade started
hdfs dfsadmin -rollingUpgrade finalize
, uses Hadoop RPC and can be executed from anywhere-rollingUpgrade
flagWe need to detect whether to enter "upgrade mode", we could do that by storing a .status.deployedProductVersion
in the CRD, which we set if unset (for new deployments) or once step 6 is complete. Then we're in upgrade mode if .status.deployedProductVersion != .spec.image.productVersion
.
Steps 3-5 could be done by adding a check to the end of the STS apply: if in_rolling_upgrade && !ready { exit_reconcile() }
where ready = .status.generation == .status.observedGeneration && .status.availableReplicas == .status.updatedReplicas == .spec.replicas
.
Step 7 would be simple enough, happens by leaving "upgrade mode".
Steps 1/2/6 are the big question marks. We could run them from the operator container, exec into an existing namenode Pod, or spawn a dedicated Job. Generally running from the operator seems like a poor idea, both because of needing to bundle a HDFS client+JVM and because the operators don't have Kerberos identities (still need to look into how JMX is affected by this too?). Running as a Job means that we don't rely on picking a single "admin namenode", but creates another asynchronous lifecycle for us to manage.
Another MVPier option would be to only add an override to do steps 3-5, leaving the dfsadmin steps (1/2/6) to be taken manually.
Was hoping it was as simple as an init container, but it looks like there is some choreography involved (with the "wait for" steps.
I think there should be a "do it for me automatically option", but if there is some clear risk to that, then it should be opt in (eg: demos can opt-in, customers might be more cautious).
Can something in stackablectl here help with said choreography to make the manual steps less of a burden?
I think there should be a "do it for me automatically option", but if there is some clear risk to that, then it should be opt in (eg: demos can opt-in, customers might be more cautious).
Ultimately, all database upgrades (which is what this is) are risky. I agree that it might make sense to have a safeguard, but we should probably think about that as a platform-wide decision then.
Can something in stackablectl here help with said choreography to make the manual steps less of a burden?
I don't think it'd make much sense. 3-5 comes down to updating the StatefulSets in order, which is managed entirely by the operator. 1/2/6 wouldn't be easier for stackablectl to do than for the operator.
Stackablectl also generally isn't really responsible for modifying stacklets at the moment, and I'd be sad to see that change.
Ah yeah, that makes it more clear.
Ultimately, all database upgrades (which is what this is) are risky.
Sure, but operational tasks can be codified (assuming there are checks at each step to prove it is safe to proceed with the next) and IMO this is what Operators are for. The problem could probably be modeled sufficiently with a Finite State Machine.
Maybe it is a tall order to codify operations like this, but this should be the ultimate platform-wide goal.
Sure, but operational tasks can be codified (assuming there are checks at each step to prove it is safe to proceed with the next) and IMO this is what Operators are for. The problem could probably be modeled sufficiently with a Finite State Machine.
I mean, yeah. I agree that I'd like to have as much as possible managed by the operator. I'm just not sure HDFS is special enough to warrant its own rules for when upgrades should be allowed.
Do we have documentation for this? If so please link it here, if not, why not?
And can you please include a snippet that we can use for the release notes for this?
Do we have documentation for this? If so please link it here, if not, why not?
The docs are at https://docs.stackable.tech/home/nightly/hdfs/usage-guide/upgrading
And can you please include a snippet that we can use for the release notes for this?
I suppose, "- The Stackable Operator for HDFS now supports upgrading existing HDFS installations" or something like that.
Is the functionality specific to 3.3 -> 3.4? And we should include a sentence about this requiring manual work.
Is the functionality specific to 3.3 -> 3.4?
No, the mechanism is generic. One caveat is that it currently takes the pessimistic approach of applying it to any upgrade, so 3.3.4 -> 3.3.6 would also trigger it.
And we should include a sentence about this requiring manual work.
Yeah that's a good point. Hm.
- The Stackable Operator for HDFS now supports upgrading existing HDFS installations. This process requires some manual intervention, however.
As of 23.4, when you upgrade your hdfs, e.g. 3.2.2 -> 3.3.4 you run into the error
Ideally we should start a rolling upgrade of all components. Currently you simply cannot upgrade your HDFS without hacking stuff (e.g. no cliOverrides to add
-upgrade
or similar)Edit from the past: At least the upgrade 3.3.4 -> 3.3.6 worked