Open Michal-Leszczynski opened 2 days ago
FYI @tzach @karol-kokoszka
Is it even important for use to use the query ping on alternator healthcheck instead of the simple ping? Maybe we are fine with a short workaround that does not require any additional cluster flags? Query ping:
// QueryPing checks if host is available, it returns RTT and error. Special errors
// are ErrTimeout and ErrUnauthorised. Ping is based on executing
// a real query.
Simple ping:
// SimplePing sends GET request to alternator port and expects 200 response code.
@Michal-Leszczynski The short-term workaround with calling simple ping on alternator is fine. But It's not only the healthcheck service that is affected. Credentials are needed for backup and restore as well. I'm pretty sure we are not testing these services with the alternator at all.
So, the short term is not only to change the alternator ping to simple ping when authorization is required, but document this fact in the documentation + test backup and restore against alternator.
The long term is to change the API and include the flags you mentioned.
OK, thanks for the explanation @Michal-Leszczynski on slack. This is just alternator API password. Backup and restore use CQL API, so the CQL password is enough.
So, let's use the simple ping at the moment.
Here is some proof that enforcing alternator authentication results in failed alternator healthcheck (even when CQL creds are provided):
miles@fedora:~/scylla-manager$ ./sctool.dev cluster add --host 192.168.200.11 --auth-token token -u cassandra -p cassandra --name myc
351bc3d3-1f05-4deb-9eba-8a5cddd50624
__
/ \ Cluster added! You can set it as default, by exporting its name or ID as env variable:
@ @ $ export SCYLLA_MANAGER_CLUSTER=351bc3d3-1f05-4deb-9eba-8a5cddd50624
| | $ export SCYLLA_MANAGER_CLUSTER=myc
|| |/
|| || Now run:
|\_/| $ sctool status -c myc
\___/ $ sctool tasks -c myc
miles@fedora:~/scylla-manager$ ./sctool.dev status -c myc
Datacenter: dc1
╭────┬────────────┬───────────┬───────────┬────────────────┬──────────┬──────┬─────────┬────────┬──────────┬──────────────────────────────────────╮
│ │ Alternator │ CQL │ REST │ Address │ Uptime │ CPUs │ Memory │ Scylla │ Agent │ Host ID │
├────┼────────────┼───────────┼───────────┼────────────────┼──────────┼──────┼─────────┼────────┼──────────┼──────────────────────────────────────┤
│ UN │ DOWN (9ms) │ UP (7ms) │ UP (2ms) │ 192.168.200.11 │ 2h36m33s │ 16 │ 31.069G │ 6.0.1 │ Snapshot │ e7240eff-d6ef-4972-a137-d5bad1d6a02c │
│ UN │ DOWN (9ms) │ UP (16ms) │ UP (3ms) │ 192.168.200.12 │ 2h36m33s │ 16 │ 31.069G │ 6.0.1 │ Snapshot │ a78f1f32-c3f5-4497-bfcb-95101599970b │
│ UN │ DOWN (8ms) │ UP (15ms) │ UP (20ms) │ 192.168.200.13 │ 2h36m33s │ 16 │ 31.069G │ 6.0.1 │ Snapshot │ 103304a2-6df0-4da3-9bc6-4cf8a83f43ec │
╰────┴────────────┴───────────┴───────────┴────────────────┴──────────┴──────┴─────────┴────────┴──────────┴──────────────────────────────────────╯
Errors:
- 192.168.200.11 alternator: MissingAuthenticationTokenException: Authorization header is mandatory for signature verification
status code: 400, request id:
- 192.168.200.12 alternator: MissingAuthenticationTokenException: Authorization header is mandatory for signature verification
status code: 400, request id:
- 192.168.200.13 alternator: MissingAuthenticationTokenException: Authorization header is mandatory for signature verification
status code: 400, request id:
Up to this point, we were testing alternator without enforced authorization:
We should test the enabled version instead, but it looks like SM does not handle this case well. Simply enforcing authorization results in:
Here is the dynamodb config used for pinging alternator:
Even though it contains the
Password
andUsername
fields, they are never set. The same goes for theRequiresAuthentication
field. They are only used here:Which means that we always use the anonymous credentials.
The problem here is not about simply setting
Password
,Username
,RequiresAuthentication
fields to known values. According to alternator docs, the password used for authenticating alternator queries is not the CQL password, but rather its salted hash kept insystem.roles
table. This means that even when user specifies CQL credentials for the cluster, we are still missing the alternator password.The short workaround would be to only use QueryPing when
RequiresAuthentication = false
. The proper fix would be to add a new cluster fields--alternator-user
,--alternator-password
and use them for alternator healthcheck.I was also thinking about SM retrieving the salted hash from Scylla itself when the CQL credentials are set (by querying the
system.roles
table), but this seems like something easy to break in the future. Not to say that SM might not have the right permissions to read fromsystem.roles
table in the first place.