Closed rzetelskik closed 1 month ago
Although this can be worked around by tweaking the deep equality test, the ideal approach would be to annotate the tasks with a checksum of the most recently sent spec. There's an existing feature request which would allow us to implement this: https://github.com/scylladb/scylla-manager/issues/3645.
@zimnx @tnozicka what do you suggest?
Another issue is that this goes past the unit tests since they use a crafted "manager state", which doesn't correspond to what would normally come from the manager client. Should we maybe use a mock client instead?
I can't come up with a way to verify this trivially in our e2e suite.
not sure if that's enough at at some point we'd collide with user's note. annotations like map would be best. I even think the issue is more broad applying to the cluster definition itself and backups
Should we maybe use a mock client instead?
mock are usually not good when it come to the level of API admission / defaulting / conversion
I can't come up with a way to verify this trivially in our e2e suite.
I suppose a progressing condition might show this
I suppose a progressing condition might show this
We can't use them from the manager controller, can we?
manager controller already sets status on scyllaclusters, it can add its own progressing condition - if it detects a change, it sets progressing, if next sync detects no change, it sets false
not sure if that's enough at at some point we'd collide with user's note. annotations like map would be best. I even think the issue is more broad applying to the cluster definition itself and backups
Sure, I wasn't proposing we do exactly this, only pointing out there's already a need for it - labels/annotations map definitely seems more fitting. There's even another issue for clusters already: https://github.com/scylladb/scylla-manager/issues/3219. It's closer to what we need so I'll update this one instead.
Just for the record this is waiting for https://github.com/scylladb/scylla-manager/issues/3219. The manager team agreed to add a metadata/labels map to the clusters/tasks API and we'll use that to decide if the operator controls the given object and to compare hashes of the objects to decide if we need to update them. It won't come in 3.2.8 though, we'll have to wait a bit longer. Xref: https://github.com/scylladb/scylla-manager/pull/3828#issuecomment-2082522924
The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
/lifecycle stale
/remove-lifecycle stale /triage accepted
https://github.com/scylladb/scylla-manager/issues/3219 was closed as completed with https://github.com/scylladb/scylla-manager/pull/3934, so this is no longer blocked on SM.
What happened?
Scylla Manager controller decides to update the tasks defined in ScyllaCluster's spec by checking deep equality between the definition and the task obtained from the Manager's state.
https://github.com/scylladb/scylla-operator/blob/f2336ee228b4132081a179c5b8b9976a6d725c7e/pkg/controller/manager/sync_action.go#L157
Since some fields are converted when translating them to requests to Scylla Manager, but not when converting them back, the deep equality will always be false in some cases. This in turn means that tasks can be updated indefinitely in a loop, despite their specification not changing. This causes superfluous, additional load to Scylla Manager and the controller.
The same situation can also be caused by the Manager defaulting some fields or not returning their values in API call responses.
Example logs:
In the above scenario the infinite updates come from the discrepancy of
small_table_threshold
value between ScyllaCluster's spec and the Manager's state, due to the value being converted before sending the request.What did you expect to happen?
The tasks should not be updated when there are no changes in their spec.
How can we reproduce it (as minimally and precisely as possible)?
Schedule any task using ScyllaCluster's API.
Scylla Operator version
master
Kubernetes platform name and version
n/a
Please attach the must-gather archive.
n/a
Anything else we need to know?
No response