scylladb / scylla-manager

The Scylla Manager
https://manager.docs.scylladb.com/stable/
Other
51 stars 33 forks source link

Give possibility for labeling the clusters added to SM #3219

Closed Choraden closed 1 month ago

Choraden commented 2 years ago

Scylla-manager omits fields such as Username, Password, AuthToken in response to /clusters request. For security reasons this is desired, but when it comes to Scylla-operator, clusters cannot be compared and updated when the only field that changes is e.g. password.

That's how the cluster model looks like right now: https://github.com/scylladb/scylla-manager/blob/master/swagger/gen/scylla-manager/models/cluster.go

I would suggest adding a hash field for client to set, so that it will be possible to distinguish different clusters without any security risks.

karol-kokoszka commented 1 year ago

@zimnx what's exactly needed here ? Operator must know all the detials about the cluster including username, password and auth-token ? Is this issue still valid ?

zimnx commented 1 year ago

The issue is there's no way to tell whether cluster you PUT into API is in the desired state, because GET doesn't return all the info - which is fine. This leads to situation where from automation we can't tell whether credentials are updated or not.

Proposed solution is to add a labels field to the clsuters. User defined key-values which might be useful for filtering, tagging or putting arbitrary values like hash of entire cluster we put in API. This would allow on our side to compute hash of required object - with up-to-date credentials - and compare it with object in API. If hash mismatches, then we need to update the cluster.

rzetelskik commented 6 months ago

This would be useful for us in tasks as well, as it would allow for a more robust assessment of the tasks being up to date, i.e. we wouldn't have to compare the properties of the desired task specification and the actual state in the manager. Ref: https://github.com/scylladb/scylla-operator/issues/1827.

The approach proposed by @zimnx, i.e. a map of labels/annotations, would work in both cases.

@karol-kokoszka can we have it triaged?

karol-kokoszka commented 5 months ago

@rzetelskik @zimnx API of manager is returning what you want, have a look

➜  scylla-manager git:(master) curl 127.0.0.1:5080/api/v1/cluster/294e86f8-6133-4930-9d88-94ac83bcf9cf
{"id":"294e86f8-6133-4930-9d88-94ac83bcf9cf","name":"","host":"192.168.200.11","auth_token":"token","force_tls_disabled":false,"force_non_ssl_session_port":false}
➜  scylla-manager git:(master) curl 127.0.0.1:5080/api/v1/clusters                                    
[{"id":"294e86f8-6133-4930-9d88-94ac83bcf9cf","name":"","host":"192.168.200.11","auth_token":"token","force_tls_disabled":false,"force_non_ssl_session_port":false,"username":"set","password":"set"}]

What is missing exactly ?

rzetelskik commented 5 months ago

@karol-kokoszka let's forget about the initial problem described here and focus on the feature request. Let me give you a different use case:

Say we create a cluster and try to save its ID on our side. The ID can be lost or we can simply get an older version of the object to process, which doesn't yet have the ID. Without the ID we can't be sure that the cluster we've found through listing is the one we created earlier, or if we just hit a name collision - so we have to delete it and create it again. How do the annotations help here? We could save a UUID there, so that when we get the cluster from the manager state, we know for sure it's the one we created earlier, regardless of if we lost the cluster ID in the process or not. Hitting this is actually quite common.

Another one is the updates I mentioned earlier - even if the calls you posted return all the fields we need, on every iteration we have to translate between the manager state and our internal one, and compare the specifications to figure out if we need to update it or not. Even if we do it correctly at some point and get all the fields right, it's prone to break with any changes in defaulting or with new attributes. If we have the possibility to save the hash of our representation of the cluster/task in manager state, its enough for us to compare the hashes to know if we need to update, instead of comparing the entire specification every time.

I understand you might be hesitant to extend the API with something serving a specific use case, but I think the solution to this is broad enough to prove helpful for other use cases as well.

rzetelskik commented 5 months ago

API of manager is returning what you want, have a look

➜ scylla-manager git:(master) curl 127.0.0.1:5080/api/v1/cluster/294e86f8-6133-4930-9d88-94ac83bcf9cf {"id":"294e86f8-6133-4930-9d88-94ac83bcf9cf","name":"","host":"192.168.200.11","auth_token":"token","force_tls_disabled":false,"force_non_ssl_session_port":false} ➜ scylla-manager git:(master) curl 127.0.0.1:5080/api/v1/clusters
[{"id":"294e86f8-6133-4930-9d88-94ac83bcf9cf","name":"","host":"192.168.200.11","auth_token":"token","force_tls_disabled":false,"force_non_ssl_session_port":false,"username":"set","password":"set"}] What is missing exactly ?

@karol-kokoszka coming back to your comment - the issue described by @zimnx still stands. The responses posted by you only say whether the username/password are set, not what their values are - and rightly so. What follows from that though is that we can't verify whether they are synchronised, i.e. if we were to change the username/password, we can't verify if we should send an update request or if they were already updated. The annotations would allow us to verify if the hashes changed, and so if we should send an update request. ATM there's no way for us to check this for some of the values, namely password in this case.

$ sctool cluster add --host=scylla-client.scylla.svc --name=scylla/scylla --auth-token=v4qw9nxq75f8ssp7bx8f76t5r64jg7t7wzkffp849wvjt67hrk9xdv6wncq5rjr6qq8jckdmvfr75mr6lnbrqgbcr8n84ljwggxmtrb8pn4jkzwqx9x8plfz5ndl8fbf --force-non-ssl-session-port --force-tls-disabled --username=cassandra --password=cassandra
c9e1cc68-09b6-4048-8bc6-bf730b66570c
WARNING! TLS is explicitly disabled on cluster sessions (--force-tls-disabled).
WARNING! Cluster session is going to always use non SSL port (--force-non-ssl-session-port).
 __
/  \     Cluster added! You can set it as default, by exporting its name or ID as env variable:
@  @     $ export SCYLLA_MANAGER_CLUSTER=c9e1cc68-09b6-4048-8bc6-bf730b66570c
|  |     $ export SCYLLA_MANAGER_CLUSTER=scylla/scylla
|| |/
|| ||    Now run:
|\_/|    $ sctool status -c scylla/scylla
\___/    $ sctool tasks -c scylla/scylla

$  curl -i 127.0.0.1:5080/api/v1/clusters
HTTP/1.1 200 OK
Content-Type: application/json
Date: Fri, 29 Mar 2024 17:11:22 GMT
Content-Length: 343

[{"id":"c9e1cc68-09b6-4048-8bc6-bf730b66570c","name":"scylla/scylla","host":"scylla-client.scylla.svc","auth_token":"v4qw9nxq75f8ssp7bx8f76t5r64jg7t7wzkffp849wvjt67hrk9xdv6wncq5rjr6qq8jckdmvfr75mr6lnbrqgbcr8n84ljwggxmtrb8pn4jkzwqx9x8plfz5ndl8fbf","force_tls_disabled":true,"force_non_ssl_session_port":true,"username":"set","password":"set"}]

$ curl -i 127.0.0.1:5080/api/v1/cluster/c9e1cc68-09b6-4048-8bc6-bf730b66570c
HTTP/1.1 200 OK
Content-Type: application/json
Date: Fri, 29 Mar 2024 17:12:02 GMT
Content-Length: 307

{"id":"c9e1cc68-09b6-4048-8bc6-bf730b66570c","name":"scylla/scylla","host":"scylla-client.scylla.svc","auth_token":"v4qw9nxq75f8ssp7bx8f76t5r64jg7t7wzkffp849wvjt67hrk9xdv6wncq5rjr6qq8jckdmvfr75mr6lnbrqgbcr8n84ljwggxmtrb8pn4jkzwqx9x8plfz5ndl8fbf","force_tls_disabled":true,"force_non_ssl_session_port":true}

$ sctool cluster update --cluster=c9e1cc68-09b6-4048-8bc6-bf730b66570c --username=cassandra --password=changed

$ curl -i 127.0.0.1:5080/api/v1/clusters
HTTP/1.1 200 OK
Content-Type: application/json
Date: Fri, 29 Mar 2024 17:12:43 GMT
Content-Length: 343

[{"id":"c9e1cc68-09b6-4048-8bc6-bf730b66570c","name":"scylla/scylla","host":"scylla-client.scylla.svc","auth_token":"v4qw9nxq75f8ssp7bx8f76t5r64jg7t7wzkffp849wvjt67hrk9xdv6wncq5rjr6qq8jckdmvfr75mr6lnbrqgbcr8n84ljwggxmtrb8pn4jkzwqx9x8plfz5ndl8fbf","force_tls_disabled":true,"force_non_ssl_session_port":true,"username":"set","password":"set"}]
karol-kokoszka commented 5 months ago

@rzetelskik but why do you need to know more ? If you send the request to update the cluster with new password/username then you got HTTP 200 if everything is fine. Do you want to check if manager actually did what is reported as done ? ...or you need to have a mechanism to validate if the password is similar (compare hashes) to what you expect the password should be ?

This is not the best approach to let sever responding you with the answer if the password stored matches the given one. This is the CQL credential, why you cannot just update it again to make sure that the expected one is used by the manager ?

rzetelskik commented 5 months ago

...or you need to have a mechanism to validate if the password is similar (compare hashes) to what you expect the password should be ?

this, but not just for the password - the proposed solution would let us hash the entire specification on our side and only compare the hashes, instead of comparing the manager state and our desired specification field by field

This is the CQL credential, why you cannot just update it again to make sure that the expected one is used by the manager ?

That's essentialy what we end up doing now, and we flood the manager with unnecessary updates. See e.g. https://github.com/scylladb/scylla-operator/issues/1827 - this issue is mentioning tasks, not clusters, and is caused by some inconsistencies in conversions between our desired specification and the manager state, and back, but you can see in the example logs how frequent these updates are.

karol-kokoszka commented 5 months ago

API of scylla-manager is not protected with any key, we support HTTP and HTTPS though.

What if scylla-manager will expose the endpoint to compare the object hashes only on HTTPS server and only if server requires and verifies client certificate ?

So, that part would need to be enabled:

# Version of TLS protocol and cipher suites to use with HTTPS.
# The supported versions are: TLSv1.3, TLSv1.2, TLSv1.0.
# The TLSv1.2 is restricted to the following cipher suites:
# ECDHE-ECDSA-WITH-AES-128-GCM-SHA256, ECDHE-RSA-WITH-AES-128-GCM-SHA256,
# ECDHE-ECDSA-WITH-AES-256-GCM-SHA384, ECDHE-RSA-WITH-AES-256-GCM-SHA384,
# ECDHE-ECDSA-WITH-CHACHA20-POLY1305, ECDHE-RSA-WITH-CHACHA20-POLY1305.
# The TLSv1.0 should never be used as it's deprecated.
#tls_mode: TLSv1.2

# TLS certificate and key files to use with HTTPS. The files must contain PEM
# encoded data. If not set a self-signed certificate will be generated,
# the certificate is valid for 1 year after start and uses EC P256 private key.
#tls_cert_file:
#tls_key_file:
#
# CA to validate client certificates, if set client must present a certificate
# signed by the CA.
#tls_ca_file:

We must have fully secured communication channel if you need to make this kind of comparison. Object hash will contain the sensitive information still.

rzetelskik commented 5 months ago

The only thing we're requesting here is enriching the cluster/task structs with user-defined key-value pairs, allowing the users to annotate the objects - akin to https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/. Whatever the API clients do with it would be up to the clients, so I don't really see why that would be required?

karol-kokoszka commented 5 months ago

The only thing we're requesting here is enriching the cluster/task structs with user-defined key-value pairs, allowing the users to annotate the objects - akin to https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/. Whatever the API clients do with it would be up to the clients, so I don't really see why that would be required?

Because, the password (even hashed) is the sensitive information that we don't want to share in non-encrypted mode. cc: @tzach

rzetelskik commented 5 months ago

Because, the password (even hashed) is the sensitive information that we don't want to share in non-encrypted mode.

No one is asking you to - the implementation of checksums is just a use case of the metadata annotations, so it's on the client to not leak anything there. As a user, I might as well leak it with the current implementation - I could just put the password as the cluster name. Again, this feature request is not asking the manager to provide a mechanism hashing the manager objects - it's just to provide a tool for attaching and retrieving additional metadata from the objects (clusters, tasks). I suppose the original issue here could be misleading.

karol-kokoszka commented 5 months ago

From the issue description:

I would suggest adding a hash field for client to set, so that it will be possible to distinguish different clusters without any security risks.

Got it... It sound like setting version of an object. But I don't understand this part from @zimnx

Proposed solution is to add a labels field to the clsuters. User defined key-values which might be useful for filtering, tagging or putting arbitrary values like hash of entire cluster we put in API. This would allow on our side to compute hash of required object - with up-to-date credentials - and compare it with object in API. If hash mismatches, then we need to update the cluster.

Does it mean that it will actually be the operator that counts the hash (or checksum) of an object and saves it among with the cluster ? If you include password into the hashed object (or the checksum) then you creates the security concern, Do you really want to do it ? Risk level is low, but it's still something what is not recommended. Speak with someone who is responsible for security in the company to check if this is a way to go.

What if Scylla Manager versions the cluster object ? And returns current version for every PUT / GET operation on cluster ? You would need to save the version of cluster somewhere to know if you are up to date or not.

rzetelskik commented 5 months ago

Does it mean that it will actually be the operator that counts the hash (or checksum) of an object and saves it among with the cluster ?

Yes.

Do you really want to do it ? Risk level is low, but it's still something what is not recommended. Speak with someone who is responsible for security in the company to check if this is a way to go.

I suppose you're worried about dictionary attacks, which is a valid concern. The metadata annotations could always be used to store a salt together with the hash though.

What if Scylla Manager versions the cluster object ? And returns current version for every PUT / GET operation on cluster ? You would need to save the version of cluster somewhere to know if you are up to date or not.

That could also work, although a bit worse for us - hitting conflicts or reconciling older object versions on our side would again cause unnecessary updates. Hard to say which solution would be more versatile. @zimnx wdyt?

zimnx commented 5 months ago

We just want to store metadata of object, we gave legitimate use case but overall what we do with it shouldn't be SM concern.

API to get version of object is not what we want, as we build expected state from current state. We construct every task and cluster field from various APIs and then combine them into expected state. Version of an object is just a side effect of an update, it doesn't carry any meaningful semantic, hence cannot be compared with expected state, as no one defines "i want backup task in version 2137 which is cron xxx location yyy", but "i want backup task with cron xxx, location yyy". To keep track whether desired object matches version returned by manager, we would need to keep hash->version mapping in database, which wouldn't be necessary when metadata is stored alongside object.

karol-kokoszka commented 5 months ago

OK, I'll set the milestone for next patch/minor release.

The goal for manager is to just extend the cluster object in DB with label text column.

rzetelskik commented 5 months ago

just extend the cluster object

And tasks imo.

karol-kokoszka commented 4 months ago

grooming notes

How to add label to cluster / task.

If the same label will be listed in both delete-lables and set-labels then the set-tables takes priority.

Labels should be listed in sctool tasks / sctool info / sctool cluster list.

Michal-Leszczynski commented 2 months ago

The requirements listed above focus on the user<->sctool interaction:

How to add label to cluster / task:

  • sctool cluster/task add/update to support map type flag
  • update is expected to save the diff to the DB and the flag name should be --set-labels
  • to remove the label, there is a need for additional flag --delete-labels

So it's also worth to define what happens in SM API. In the SM API we have the following objects:

SM does not really support partial cluster/task update - it relies on receiving the full definition of Cluster/Task. There are exceptions to this - secrets like username/password - when updating a cluster, user can send Cluster with the username/password fields set to an empty string and SM won't update them.

But this makes it impossible to delete username/password via regular Cluster PUT, so SM had to extend the DELETE endpoint to accept additional, optional params specifying, whether username/password should be deleted.

So it looks like SM API is kind of confusing :/ The possible approaches that I see are:

  1. Add labels as map[string]string to all of the mentioned object schemes. On get/list, return all the labels. On create/update, just set the labels to the provided ones. Note that this is not a problem in terms of making sctool accept just the diff of the labels, as sctool first fetches the Cluster/Task, applies the diff onto it, and sends it back to SM.

  2. Add labels as map[string]string to all of the mentioned objects. On get/list, return all the labels. On create, just set the labels to the provided ones. On update, treat provided map as a diff (treat empty label values as delete request).

  3. Add labels as map[string]string in Task/TaskListItem and as map[string]*string (either add a separate ClusterUpdate or live with the fact that getting/listing clusters returns map[string]*string). The *string type would not require the empty label value limitation.

  4. All of the solutions above don't care about concurrent updates. If there is a use case for operator to update the same task in parallel, we could use something like compare-and-swap, so that operator would update and label cluster only when it is in the expected state. The compare-and-swap syntax could be used in a way that allows only a partial update.

From my POV, the easiest and most consistent (with the current state of SM API) approach would be the first one, but does it fit the operator use case? If not, do the others work?

@karol-kokoszka @rzetelskik @zimnx

zimnx commented 2 months ago

From my POV, the easiest and most consistent (with the current state of SM API) approach would be the first one, but does it fit the operator use case?

Option number 1 is fine, we can always provide entire object we want to store.

All of the solutions above don't care about concurrent updates. If there is a use case for operator to update the same task in parallel, we could use something like compare-and-swap, so that operator would update and label cluster only when it is in the expected state. The compare-and-swap syntax could be used in a way that allows only a partial update.

I don't think we should implement concurrency support as part of this task. I think better approach than compare and swap would be optimistic concurrency, but it's a bigger feature.

update is expected to save the diff to the DB and the flag name should be --set-labels to remove the label, there is a need for additional flag --delete-labels

It could be simplified by assuming that deletion is an update, but with special format. I like what kubectl did. For example to update a Pod foo with a key label you do kubectl label pod foo key=value, to delete it you add - suffix to key name, e.g kubectl label pod foo key-.

If the same label will be listed in both delete-lables and set-labels then the set-tables takes priority.

IMO this should be rejected as it's an obvious mistake.