k8ssandra / k8ssandra-operator

The Kubernetes operator for K8ssandra
https://k8ssandra.io/
Apache License 2.0
169 stars 78 forks source link

Non-ASCII characters in DC name are not handled correctly #1258

Open olim7t opened 7 months ago

olim7t commented 7 months ago

(this assumes that #1252 is fixed first)

DC names can contain non-ASCII unicode characters, for example Cql_DåtåCenter1. But a K8ssandraCluster with such a DC name gets stuck waiting for the DC to update:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  namespace: test
  name: k8s-cluster
spec:
  cassandra:
    serverVersion: 4.1.0
    clusterName: Cql_Cluster
    datacenters:
      - metadata:
          name: k8s-dc1
        datacenterName: Cql_DåtåCenter1
        size: 1
        racks:
          - name: Cql_Rack1
        storageConfig:
          cassandraDataVolumeClaimSpec:
            storageClassName: standard
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 100Mi

The following error shows up in the cass-operator logs:

ERROR   incorrect status code when calling Node Management Endpoint { ... "name": "k8s-dc1", "statusCode": 500,
    "error": "incorrect status code of 500 when calling endpoint"}
github.com/k8ssandra/cass-operator/pkg/httphelper.callNodeMgmtEndpoint
    /workspace/pkg/httphelper/client.go:1183
github.com/k8ssandra/cass-operator/pkg/httphelper.(*NodeMgmtClient).CallCreateRoleEndpoint
    /workspace/pkg/httphelper/client.go:318
github.com/k8ssandra/cass-operator/pkg/reconciliation.(*ReconciliationContext).upsertUser
    /workspace/pkg/reconciliation/reconcile_racks.go:838
github.com/k8ssandra/cass-operator/pkg/reconciliation.(*ReconciliationContext).CreateUsers
    /workspace/pkg/reconciliation/reconcile_racks.go:902
github.com/k8ssandra/cass-operator/pkg/reconciliation.(*ReconciliationContext).ReconcileAllRacks
    /workspace/pkg/reconciliation/reconcile_racks.go:2424
github.com/k8ssandra/cass-operator/pkg/reconciliation.(*ReconciliationContext).CalculateReconciliationActions
    /workspace/pkg/reconciliation/handler.go:68
github.com/k8ssandra/cass-operator/internal/controllers/cassandra.(*CassandraDatacenterReconciler).Reconcile

And in the Cassandra pod's server-system-logger container:

Cannot achieve consistency level ONE" while executing SELECT * FROM system_auth.roles WHERE role = 'cassandra' ALLOW FILTERING

Trying to connect to the pod directly with CQLSH yields a similar error:

Unable to perform authentication: Cannot achieve consistency level LOCAL_QUORUM

This suggests a problem with system keyspaces replication settings, although the annotation on the K8ssandraCluster looks correct:

k8ssandra.io/initial-system-replication: '{"Cql_DåtåCenter1":1}'

Note that this works correctly for a standalone CassandraDatacenter.

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: K8OP-34

olim7t commented 7 months ago

I re-created the K8ssandraCluster with spec.auth: false to work around the authentication issue and cqlsh into the node:

cqlsh> select data_center from system.local;

 data_center
-------------------
 Cql_DåtåCenter1

(1 rows)
cqlsh> select * from system_schema.keyspaces;

 keyspace_name      | durable_writes | replication
--------------------+----------------+-------------------------------------------------------------------------------------------
        system_auth |           True | {'Cql_DåtåCenter1': '1', 'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy'}
      system_schema |           True |                                   {'class': 'org.apache.cassandra.locator.LocalStrategy'}
 system_distributed |           True | {'Cql_DåtåCenter1': '1', 'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy'}
             system |           True |                                   {'class': 'org.apache.cassandra.locator.LocalStrategy'}
      system_traces |           True | {'Cql_DåtåCenter1': '1', 'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy'}

(5 rows)
cqlsh> select * from system_auth.roles;
NoHostAvailable:

The replication settings are correctly injected, it's the initial configuration of the DC name that failed. I think what's happening is that Cassandra readscassandra-rackdc.properties using ISO8859-1 encoding, but we don't properly escape the value when we write that file:

$ cat /config/cassandra-rackdc.properties
dc=Cql_DåtåCenter1       # should be Cql_D\u00e5t\u00e5Center1
rack=Cql_Rack1
olim7t commented 7 months ago

This is a cass-config-builder issue: datastax/cass-config-builder#53

EDIT -- and k8ssandra-client probably has the same bug, since this is what we now use for 4.1+ clusters.