scylladb / scylla-manager

The Scylla Manager
https://manager.docs.scylladb.com/stable/
Other
51 stars 33 forks source link

SM can't make alternator query ping when authorization is required #4036

Open Michal-Leszczynski opened 2 days ago

Michal-Leszczynski commented 2 days ago

Up to this point, we were testing alternator without enforced authorization:

alternator_enforce_authorization: false

We should test the enabled version instead, but it looks like SM does not handle this case well. Simply enforcing authorization results in:

=== RUN   TestPingIntegration/query
    dynamoping_integration_test.go:34: MissingAuthenticationTokenException: Authorization header is mandatory 

Here is the dynamodb config used for pinging alternator:

// Config specifies the ping configuration, note that timeout is mandatory and
// zero timeout will result in errors.
type Config struct {
    Addr                   string
    RequiresAuthentication bool
    Username               string
    Password               string
    Timeout                time.Duration
    TLSConfig              *tls.Config
}

Even though it contains the Password and Username fields, they are never set. The same goes for the RequiresAuthentication field. They are only used here:

    sess := session.Must(session.NewSessionWithOptions(session.Options{
        Config: aws.Config{
            Endpoint:   aws.String(config.Addr),
            Region:     aws.String("scylla"),
            HTTPClient: httpClient(config),
        },
    }))

    if config.RequiresAuthentication && config.Username != "" && config.Password != "" {
        sess.Config.Credentials = credentials.NewStaticCredentials(config.Username, config.Password, "")
    } else {
        sess.Config.Credentials = credentials.AnonymousCredentials
    }

Which means that we always use the anonymous credentials.

The problem here is not about simply setting Password, Username, RequiresAuthentication fields to known values. According to alternator docs, the password used for authenticating alternator queries is not the CQL password, but rather its salted hash kept in system.roles table. This means that even when user specifies CQL credentials for the cluster, we are still missing the alternator password.

The short workaround would be to only use QueryPing when RequiresAuthentication = false. The proper fix would be to add a new cluster fields --alternator-user, --alternator-password and use them for alternator healthcheck.

I was also thinking about SM retrieving the salted hash from Scylla itself when the CQL credentials are set (by querying the system.roles table), but this seems like something easy to break in the future. Not to say that SM might not have the right permissions to read from system.roles table in the first place.

Michal-Leszczynski commented 2 days ago

FYI @tzach @karol-kokoszka

Michal-Leszczynski commented 2 days ago

Is it even important for use to use the query ping on alternator healthcheck instead of the simple ping? Maybe we are fine with a short workaround that does not require any additional cluster flags? Query ping:

// QueryPing checks if host is available, it returns RTT and error. Special errors
// are ErrTimeout and ErrUnauthorised. Ping is based on executing
// a real query.

Simple ping:

// SimplePing sends GET request to alternator port and expects 200 response code.
karol-kokoszka commented 1 day ago

@Michal-Leszczynski The short-term workaround with calling simple ping on alternator is fine. But It's not only the healthcheck service that is affected. Credentials are needed for backup and restore as well. I'm pretty sure we are not testing these services with the alternator at all.

So, the short term is not only to change the alternator ping to simple ping when authorization is required, but document this fact in the documentation + test backup and restore against alternator.

The long term is to change the API and include the flags you mentioned.

karol-kokoszka commented 1 day ago

OK, thanks for the explanation @Michal-Leszczynski on slack. This is just alternator API password. Backup and restore use CQL API, so the CQL password is enough.

So, let's use the simple ping at the moment.

Michal-Leszczynski commented 1 day ago

Here is some proof that enforcing alternator authentication results in failed alternator healthcheck (even when CQL creds are provided):

miles@fedora:~/scylla-manager$ ./sctool.dev cluster add --host 192.168.200.11 --auth-token token -u cassandra -p cassandra --name myc
351bc3d3-1f05-4deb-9eba-8a5cddd50624
 __  
/  \     Cluster added! You can set it as default, by exporting its name or ID as env variable:
@  @     $ export SCYLLA_MANAGER_CLUSTER=351bc3d3-1f05-4deb-9eba-8a5cddd50624
|  |     $ export SCYLLA_MANAGER_CLUSTER=myc
|| |/    
|| ||    Now run:
|\_/|    $ sctool status -c myc
\___/    $ sctool tasks -c myc

miles@fedora:~/scylla-manager$ ./sctool.dev status -c myc
Datacenter: dc1
╭────┬────────────┬───────────┬───────────┬────────────────┬──────────┬──────┬─────────┬────────┬──────────┬──────────────────────────────────────╮
│    │ Alternator │ CQL       │ REST      │ Address        │ Uptime   │ CPUs │ Memory  │ Scylla │ Agent    │ Host ID                              │
├────┼────────────┼───────────┼───────────┼────────────────┼──────────┼──────┼─────────┼────────┼──────────┼──────────────────────────────────────┤
│ UN │ DOWN (9ms) │ UP (7ms)  │ UP (2ms)  │ 192.168.200.11 │ 2h36m33s │ 16   │ 31.069G │ 6.0.1  │ Snapshot │ e7240eff-d6ef-4972-a137-d5bad1d6a02c │
│ UN │ DOWN (9ms) │ UP (16ms) │ UP (3ms)  │ 192.168.200.12 │ 2h36m33s │ 16   │ 31.069G │ 6.0.1  │ Snapshot │ a78f1f32-c3f5-4497-bfcb-95101599970b │
│ UN │ DOWN (8ms) │ UP (15ms) │ UP (20ms) │ 192.168.200.13 │ 2h36m33s │ 16   │ 31.069G │ 6.0.1  │ Snapshot │ 103304a2-6df0-4da3-9bc6-4cf8a83f43ec │
╰────┴────────────┴───────────┴───────────┴────────────────┴──────────┴──────┴─────────┴────────┴──────────┴──────────────────────────────────────╯
Errors:
- 192.168.200.11 alternator: MissingAuthenticationTokenException: Authorization header is mandatory for signature verification
        status code: 400, request id: 
- 192.168.200.12 alternator: MissingAuthenticationTokenException: Authorization header is mandatory for signature verification
        status code: 400, request id: 
- 192.168.200.13 alternator: MissingAuthenticationTokenException: Authorization header is mandatory for signature verification
        status code: 400, request id: