Create a boolean configuration setting to disable CRUSH rule changes for existing pools

mohitrajain commented 2 months ago

Is this a bug report or feature request?

Feature Request

What should the feature do: In v1.12.8, a feature was released to update the pool deviceClass: pool: allow updating deviceClass on existing pool.

This feature request proposes an additional flag that allows administrators to opt for the proposed changes from the aforementioned change. Optionally, it could log the recommended state changes for pools upon each evaluation, providing insights into required adjustments. Our clusters are dynamic, with deviceClasses added at later stages. Since the aforementioned feature was not available previously, manual adjustments were made. We aim to avoid disrupting the production cluster by preventing re-mapping of objects due to new crush rules.

What is use case behind this feature: After updating our staging Rook cluster to v1.12.8, which had undergone manual pool modifications in the past, we encountered some unexpected changes:

2024-03-01 11:20:10.879837 I | cephclient: creating a new crush rule for changed deviceClass on crush rule "rbd-hdd-new"
2024-03-01 11:20:10.879851 I | cephclient: updating pool ".mgr" failure domain from "host" to "host" with new crush rule ".mgr_host_hdd"
2024-03-01 11:20:10.879855 I | cephclient: crush rule "rbd-hdd-new" will no longer be used by pool ".mgr"
- A new rule was created even though the hdd deviceClass was specified, due to the bug fix detailed [here](https://github.com/rook/rook/pull/13772 )
---
2024-03-01 11:20:56.454723 I | cephclient: creating a new crush rule for changed deviceClass on crush rule "rbd-ssd-new"
2024-03-01 11:20:56.454740 I | cephclient: updating pool "rbd-ssd" failure domain from "host" to "host" with new crush rule "rbd-ssd_host_ssd"
2024-03-01 11:20:56.454744 I | cephclient: crush rule "rbd-ssd-new" will no longer be used by pool "rbd-ssd"
- A new rule was created even though the hdd deviceClass was specified, due to the bug fix detailed [here](https://github.com/rook/rook/pull/13772)
---
2024-03-01 11:20:28.973957 I | cephclient: creating a new crush rule for changed deviceClass on crush rule "rbd-hdd-new"
2024-03-01 11:20:28.973972 I | cephclient: updating pool "rbd-hdd" failure domain from "host" to "host" with new crush rule "rbd-hdd_host"
2024-03-01 11:20:28.973976 I | cephclient: crush rule "rbd-hdd-new" will no longer be used by pool "rbd-hdd"
- A new rule got created because no deviceClass was mentioned but they were initially set to HDD class manually and it switched to default deviceClass i.e. mix of both
--- other logs have been omitted

The message described above was observed for all the pools in our cluster. Initially, we had omitted specifying the device class for some pools, which subsequently were affected, as indicated by the logs.

The above crush rule changes led to data being re-mapped across the cluster. It's noteworthy that this issue has been addressed in version 1.13.5 with the implementation of "pool: Skip crush rule update when not needed."

We aim to mitigate any unintended effects on existing pools without a comprehensive understanding of the implications on crush rules. Implementing logging and a toggle flag would facilitate more informed decisions.

Environment: Mix device class environment.

subhamkrai commented 2 months ago

@mohitrajain Since the issue is fixed in the latest version as you mentioned, so why not upgrade to the fixed version?

mohitrajain commented 2 months ago

@subhamkrai We need to evaluate the new version in our staging cluster. However, before applying changes to the production cluster, having this feature would be beneficial. It would enable us to preserve manual adjustments on specific pools.

anthonyeleven commented 2 months ago

@subhamkrai It is indeed nice that this has been fixed. One unclear question is how many Rook releases one can jump at a time.

But as to why we ask for this setting -- this particular situation may be addressed, but what about the ones we haven't hit yet? Especially if future Rook releases grow additional actions on existing pools or other resources.

Why had we done manual changes?

The .mgr pool gets created automatically without deviceclass specification, so when data pools are created constrained to a device class, the pg autoscaler can't cope because of the shadow CRUSH root. Mind you we disable the autoscaler anyway or at least set it to warn. So we had manually changed the CRUSH rule for the .mgr pool to a different rule that specified device class -- at the time, it was my understanding that Rook never modified resources already created. So this new action on Rook's part is a change from prior behavior, one that has the potential to be impactful
Rook lacks the ability to provision multiple RGW placement targets / storage classes, so we had to create non-defaults manually. I'm very concerned that today's or especially tomorrow's Rook will do something unpleasant. So having an option to disable CRUSH rule munging would bring safety to such deployments.

OBTW, Mohit is my protege; we work together.

travisn commented 2 months ago

@anthonyeleven So you're hitting this issue in v1.12.8, and you would prefer to get the fix in a v1.12.x release before you upgrade to v1.13? We could likely backport #13772 to v1.12 and make that happen.

anthonyeleven commented 2 months ago

No, not asking for a backport. The 1.13.x fix should suffice if we can skip 1.12.8. The cluster where this matters for us currently runs Rook v1.10.5. Can we go straight to 1.13.x from this release?

The RFE to completely disable modifying existing pools is intended to forestall any future rude awakenings when new functionality is added.

travisn commented 2 months ago

in the upgrade guide it is recommended not to skip minor rook releases. It might work, but I'd recommend testing the upgrade yourself. Our testing does not cover skipping minor releases. Instead, you might consider: v1.10.5 --> v1.11.x --> v1.12.7 --> v1.13.x

anthonyeleven commented 2 months ago

Thanks!
We're testing upgrades in a small lab cluster, but there's only so closely we can align that to prod. v1.10.5 --> v1.11.x --> v1.12.7 --> v1.13.x should be feasible for us.

I didn't want to make assumptions about the .. releases

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

rook / rook

Create a boolean configuration setting to disable CRUSH rule changes for existing pools #13854