redpanda-data / helm-charts

Redpanda Helm Chart
http://redpanda.com
Apache License 2.0
76 stars 96 forks source link

Sasl superusers setup is broken #707

Closed rauanmayemir closed 1 year ago

rauanmayemir commented 1 year ago

What happened?

I followed the guide and the default values example.

It states (or at least that's how I understand it) that I don't have to specify users list if I already have secretRef secret in a required format.

I expected that redpanda will provision my users and make them superusers (superadmins?). It did provision those users, but they were not superusers.

What did you expect to happen?

Redpanda should have added superusers list to redpanda config according to the list in secretRef.

I understand that helm chart has no knowledge of what is inside that secret, but then it shouldn't say the list is optional. (it is optional, but those superusers won't be superusers and the cluster will be unusable because of broken authorization issues)

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

Wrong values:

    auth:
      sasl:
        enabled: true
        mechanism: SCRAM-SHA-256
        secretRef: "redpanda-superusers"
        users: []

How it should be:

    auth:
      sasl:
        enabled: true
        mechanism: SCRAM-SHA-256
        secretRef: "redpanda-superusers"
        users:
          - name: admin

Anything else we need to know?

The example in the docs is misleading: https://docs.redpanda.com/current/manage/kubernetes/security/sasl-kubernetes/#use-secrets

auth.sasl.users cannot be set to null, it's an array.

Which are the affected charts?

Redpanda

Chart Version(s)

redpanda-5.3.1

Cloud provider

bare-metal kubernetes

rauanmayemir commented 1 year ago

OMG, turns out, it's not just confusion, helm will break the release if I set both secret and users because it will try to create the secret using the name in secretRef that by definition will already have existed!

JakeSCahill commented 1 year ago

Hi @rauanmayemir - did you deploy the Secret in the same namespace as your Redpanda resource?

JakeSCahill commented 1 year ago

The procedure is here: https://docs.redpanda.com/current/manage/kubernetes/security/sasl-kubernetes/#use-secrets

I realise the example hardcodes the redpanda namespace (will fix that).

JakeSCahill commented 1 year ago

OMG, turns out, it's not just confusion, helm will break the release if I set both secret and users because it will try to create the secret using the name in secretRef that by definition will already have existed!

Please see the important note here: https://docs.redpanda.com/current/manage/kubernetes/security/sasl-kubernetes/?tab=tabs-3-set#use-a-yaml-list

rauanmayemir commented 1 year ago

It's not about namespace, I have the secret and it is correctly being used to provision users. The issue is about helm not adding those users in the superusers section.

I can either expose all the secrets in plaintext and set up auth.sasl.users manually as yaml list or I can use secretRef and have no way to change redpanda's superusers config.

Right now I'm bypassing auth.sasl.users to keep using secretRef and manually calling rpk cluster config set superusers ['admin'] in runtime.

rauanmayemir commented 1 year ago

helm will break the release if I set both secret and users because it will try to create the secret using the name in secretRef that by definition will already have existed

I really thought this couldn't be misunderstood.

If I set auth.sasl.users list manually (i.e add my users in the section), helm is trying to create a secret by using the name from secretRef. That is not how it's supposed to work, helm has to come up with its own name (if secretRef matchs that, then it will be my fault).

image
JakeSCahill commented 1 year ago

Oh, I see, sorry about that. I thought the issue was due to the docs. But I understand that you are requesting changes to how the chart handles creating superusers as well as reporting a bug where superusers aren't created when using Secrets.

rauanmayemir commented 1 year ago

Indeed, there are several things piling on to each other.

joejulian commented 1 year ago

Agreed. We should probably error out if both the secretRef and users are set.

RafalKorepta commented 1 year ago

In our docs we made mistake in helm command

helm upgrade --install redpanda redpanda/redpanda --namespace test-namespace --create-namespace \
  --set auth.sasl.enabled=true \
  --set auth.sasl.secretRef=redpanda-superusers
  --set auth.sasl.users=null

You can see that we are missing \ back slash after secretRef=redpanda-superusers. That's why users wan't set to null. The array in go (helm uses go template) can be set to nil (null in the case of go template) or empty array. Using that the secret will not be rendered as

The empty values are false, 0, any
nil pointer or interface value, and any array, slice, map, or
string of length zero.

REF: https://pkg.go.dev/text/template

I was able to correctly set up environment:

$ cat superusers.txt
test-user:pass-test:SCRAM-SHA-512

$ kubectl create namespace redpanda
namespace/redpanda created

$ kubectl create secret generic redpanda-superusers --namespace redpanda --from-file=superusers.txt
secret/redpanda-superusers created

$ helm upgrade --install redpanda redpanda/redpanda --namespace redpanda --create-namespace \
  --set auth.sasl.enabled=true \
  --set auth.sasl.secretRef=redpanda-superusers \
  --set auth.sasl.users=null
Release "redpanda" does not exist. Installing it now.
NAME: redpanda
LAST DEPLOYED: Mon Sep 18 18:21:33 2023
NAMESPACE: redpanda
STATUS: deployed
REVISION: 1
NOTES:
Congratulations on installing redpanda!

The pods will rollout in a few seconds. To check the status:

  kubectl -n redpanda rollout status statefulset redpanda --watch

Try some sample commands:

Create a user:

  kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk acl user create myuser --new-password changeme --mechanism SCRAM-SHA-512 --api-urls redpanda-0.redpanda.redpanda.svc.cluster.local.:9644,redpanda-1.redpanda.redpanda.svc.cluster.local.:9644,redpanda-2.redpanda.redpanda.svc.cluster.local.:9644 --admin-api-tls-enabled --admin-api-tls-truststore /etc/tls/certs/default/ca.crt

Give the user permissions:

    kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk acl create --allow-principal 'myuser' --allow-host '*' --operation all --topic 'test-topic' --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user <admin-user-in-secret> --password <admin-password-in-secret> --sasl-mechanism <mechanism-in-secret>

Get the api status:

  kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk cluster info --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user <admin-user-in-secret> --password <admin-password-in-secret> --sasl-mechanism <mechanism-in-secret>

Create a topic

  kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk topic create test-topic --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user <admin-user-in-secret> --password <admin-password-in-secret> --sasl-mechanism <mechanism-in-secret>

Describe the topic:

  kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk topic describe test-topic --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user <admin-user-in-secret> --password <admin-password-in-secret> --sasl-mechanism <mechanism-in-secret>

Delete the topic:

  kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk topic delete test-topic --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user <admin-user-in-secret> --password <admin-password-in-secret> --sasl-mechanism <mechanism-in-secret>

$ logs -c config-watcher redpanda-1
REDACTED WAITING FOR CLUSTER TO BE READY!
...
RUNNING: Monitoring and Updating SASL users
Creating user test-user...
Created user test-user...
Setting superusers configurations with users [test-user]
Completed setting superusers configurations

$ kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk cluster info --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user test-user --password pass-test --sasl-mechanism scram-sha-512
CLUSTER
=======
redpanda.c25f139f-2d39-4478-94df-3e25907d2af7

BROKERS
=======
ID    HOST                                             PORT
0*    redpanda-0.redpanda.redpanda.svc.cluster.local.  9093
1     redpanda-1.redpanda.redpanda.svc.cluster.local.  9093
2     redpanda-2.redpanda.redpanda.svc.cluster.local.  9093

TOPICS
======
NAME      PARTITIONS  REPLICAS
_schemas  1           3

$ kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk topic list --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user test-user --password pass-test --sasl-mechanism scram-sha-512
NAME      PARTITIONS  REPLICAS
_schemas  1           3

$ kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk topic create test-topic -r 3 -p 3 --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user test-user --password pass-test --sasl-mechanism scram-sha-512
TOPIC       STATUS
test-topic  OK

$ kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk topic list --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user test-user --password pass-test --sasl-mechanism scram-sha-512
NAME        PARTITIONS  REPLICAS
_schemas    1           3
test-topic  3           3

$ kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk topic delete test-topic --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user test-user --password pass-test --sasl-mechanism scram-sha-512
TOPIC       STATUS
test-topic  OK

I think I know what is the misconception here @rauanmayemir

Redpanda should have added superusers list to redpanda config according to the list in secretRef.

The configuration will not have superusers listed in configmap nor in the file on disk (redpanda.yaml). We have special place where we are constantly reconciling this superuser secret. Please take a look at the config-watcher side car . The script is located in (the secret)[https://github.com/redpanda-data/helm-charts/blob/4f599dcc17d941b3e83f84c292faf51f0740e122/charts/redpanda/templates/secrets.yaml#L152-L265]. Whenever you change the secret that super user list should be updated.

@rauanmayemir If you still have a problem with setting up the environment please don't hesitate to reopen this issue.

rauanmayemir commented 1 year ago

@RafalKorepta Why do you assume this is a bash script issue? I'm not using bash, I'm creating a yaml.

This is not fixed. This whole issue is one big misunderstanding and I am confident I am not wrong.

rauanmayemir commented 1 year ago

I can see there is a config-watcher sidecar running alongside and it's probably updating the secrets.

But those users are not being added in the rpk config as superusers, nor did it happen dynamically in runtime. My cluster was restarted and right now calling rpk cluster config get superusers returns [].

RafalKorepta commented 1 year ago

Why do you assume this is a bash script issue? I'm not using bash

I was referring to our official documentation where the helm cli with the --set command was missing the \ character.

I'm creating a yaml.

I know that. You mentioned that in the issue description.

You are using

auth:
  sasl:
    enabled: true
    mechanism: SCRAM-SHA-256
    secretRef: "redpanda-superusers"
    users: []

This is not fixed. This whole issue is one big misunderstanding and I am confident I am not wrong.

I understand you can be upset, but as I mentioned earlier please reopen this issue and provide clear steps to reproduce the problem.

As far as I can tell our documentation is describing process correctly. The users list can be set to null, as it is in our documentation, or it can be set to empty array, as you are proposing.

In both cases the end result is correct. Redpanda cluster configuration superusers is set to the correct list. Please see the following steps:

$ cat superusers.txt
test-user:pass-test:SCRAM-SHA-512

$ cat gh-issue-707-values.yaml
auth:
  sasl:
    enabled: true
    secretRef: "redpanda-superusers"
    users: null

$ helm upgrade --install redpanda redpanda/redpanda --namespace redpanda \
  --values gh-issue-707-values.yaml --reuse-values
Release "redpanda" does not exist. Installing it now.
NAME: redpanda
LAST DEPLOYED: Tue Sep 19 13:18:43 2023
NAMESPACE: redpanda
STATUS: deployed
REVISION: 1
NOTES:
Congratulations on installing redpanda!

The pods will rollout in a few seconds. To check the status:

  kubectl -n redpanda rollout status statefulset redpanda --watch

Try some sample commands:

Create a user:

  kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk acl user create myuser --new-password changeme --mechanism SCRAM-SHA-512 --api-urls redpanda-0.redpanda.redpanda.svc.cluster.local.:9644,redpanda-1.redpanda.redpanda.svc.cluster.local.:9644,redpanda-2.redpanda.redpanda.svc.cluster.local.:9644 --admin-api-tls-enabled --admin-api-tls-truststore /etc/tls/certs/default/ca.crt

Give the user permissions:

    kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk acl create --allow-principal 'myuser' --allow-host '*' --operation all --topic 'test-topic' --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user <admin-user-in-secret> --password <admin-password-in-secret> --sasl-mechanism <mechanism-in-secret>

Get the api status:

  kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk cluster info --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user <admin-user-in-secret> --password <admin-password-in-secret> --sasl-mechanism <mechanism-in-secret>

Create a topic

  kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk topic create test-topic --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user <admin-user-in-secret> --password <admin-password-in-secret> --sasl-mechanism <mechanism-in-secret>

Describe the topic:

  kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk topic describe test-topic --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user <admin-user-in-secret> --password <admin-password-in-secret> --sasl-mechanism <mechanism-in-secret>

Delete the topic:

  kubectl -n redpanda exec -ti redpanda-0 -c redpanda -- rpk topic delete test-topic --brokers redpanda-0.redpanda.redpanda.svc.cluster.local.:9093,redpanda-1.redpanda.redpanda.svc.cluster.local.:9093,redpanda-2.redpanda.redpanda.svc.cluster.local.:9093 --tls-enabled --tls-truststore /etc/tls/certs/default/ca.crt --user <admin-user-in-secret> --password <admin-password-in-secret> --sasl-mechanism <mechanism-in-secret>

$ helm list
NAME        NAMESPACE   REVISION    UPDATED                                 STATUS      CHART           APP VERSION
redpanda    redpanda    1           2023-09-19 13:18:43.375494 +0200 CEST   deployed    redpanda-5.4.2  v23.2.8

$ kubectl logs -c config-watcher redpanda-1
Waiting for cluster to be ready
CLUSTER HEALTH OVERVIEW
=======================
Healthy:                     true
Unhealthy reasons:           []
Controller ID:               0
All nodes:                   [0 1 2]
Nodes down:                  []
Leaderless partitions:       []
Under-replicated partitions: []
RUNNING: Monitoring and Updating SASL users
Creating user test-user...
Created user test-user...
Setting superusers configurations with users [test-user]
Completed setting superusers configurations

$ kubectl exec -ti redpanda-0 -- curl -k https://localhost:9644/v1/cluster_config | jq
Defaulted container "redpanda" out of: redpanda, config-watcher, tuning (init), redpanda-configurator (init)
{
  "cpu_profiler_enabled": false,
  "kafka_memory_batch_size_estimate_for_fetch": 1048576,
  "kafka_memory_share_for_fetch": 0.5,
  "legacy_unsafe_log_warning_interval_sec": 300,
  "kafka_throughput_controlled_api_keys": [
    "produce",
    "fetch"
  ],
  "kafka_quota_balancer_min_shard_throughput_ratio": 0.01,
  "kafka_quota_balancer_node_period_ms": 750,
  "kafka_quota_balancer_window_ms": 5000,
  "controller_log_accummulation_rps_capacity_move_operations": null,
  "controller_log_accummulation_rps_capacity_acls_and_users_operations": null,
  "controller_log_accummulation_rps_capacity_topic_operations": null,
  "rps_limit_topic_operations": 1000,
  "enable_controller_log_rate_limiting": false,
  "metrics_reporter_report_interval": 86400000,
  "enable_metrics_reporter": true,
  "storage_strict_data_init": false,
  "storage_space_alert_free_threshold_percent": 5,
  "health_monitor_max_metadata_age": 10000,
  "health_manager_tick_interval": 180000,
  "internal_topic_replication_factor": 3,
  "leader_balancer_mute_timeout": 300000,
  "leader_balancer_mode": "random_hill_climbing",
  "partition_autobalancing_movement_batch_size_bytes": 5368709120,
  "partition_autobalancing_node_availability_timeout_sec": 900,
  "full_raft_configuration_recovery_pattern": [],
  "zstd_decompress_workspace_bytes": 8388608,
  "kafka_qdc_max_depth": 100,
  "kafka_qdc_idle_depth": 10,
  "kafka_qdc_enable": false,
  "kafka_qdc_latency_alpha": 0.002,
  "partition_autobalancing_concurrent_moves": 50,
  "cloud_storage_chunk_prefetch": 0,
  "partition_autobalancing_min_size_threshold": null,
  "cloud_storage_chunk_eviction_strategy": "eager",
  "cloud_storage_disable_chunk_reads": false,
  "cloud_storage_min_chunks_per_segment_threshold": 5,
  "cloud_storage_cache_chunk_size": 16777216,
  "cloud_storage_max_materialized_segments_per_shard": null,
  "cloud_storage_max_partition_readers_per_shard": null,
  "metrics_reporter_tick_interval": 60000,
  "cloud_storage_max_segment_readers_per_shard": null,
  "cloud_storage_cache_max_objects": 100000,
  "cloud_storage_cache_size_percent": 20,
  "retention_local_trim_overage_coeff": 2,
  "retention_local_target_capacity_percent": 80,
  "retention_local_target_capacity_bytes": null,
  "retention_local_target_ms_default": 86400000,
  "retention_local_target_bytes_default": null,
  "cloud_storage_upload_ctrl_min_shares": 100,
  "cloud_storage_upload_ctrl_d_coeff": 0,
  "cloud_storage_upload_ctrl_p_coeff": -2,
  "cloud_storage_azure_adls_port": null,
  "cloud_storage_azure_shared_key": null,
  "cloud_storage_cache_check_interval": 5000,
  "retention_local_trim_interval": 30000,
  "cloud_storage_azure_container": null,
  "cloud_storage_disable_upload_consistency_checks": false,
  "cloud_storage_topic_purge_grace_period_ms": 30000,
  "cloud_storage_materialized_manifest_ttl_ms": 10000,
  "cloud_storage_manifest_cache_size": 1048576,
  "cloud_storage_spillover_manifest_size": null,
  "cloud_storage_credentials_host": null,
  "cloud_storage_backend": "unknown",
  "cloud_storage_segment_size_target": null,
  "cloud_storage_recovery_temporary_retention_bytes_default": 1073741824,
  "enable_rack_awareness": false,
  "cloud_storage_enable_compacted_topic_reupload": true,
  "node_isolation_heartbeat_timeout": 3000,
  "cloud_storage_disable_upload_loop_for_tests": false,
  "cloud_storage_enable_segment_merging": true,
  "storage_min_free_bytes": 1073741824,
  "cloud_storage_idle_threshold_rps": 1,
  "kafka_throughput_control": [],
  "cloud_storage_idle_timeout_ms": 10000,
  "cloud_storage_manifest_max_upload_interval_sec": 60,
  "readers_cache_eviction_timeout_ms": 30000,
  "cloud_storage_max_connection_idle_time_ms": 5000,
  "storage_target_replay_bytes": 10737418240,
  "cloud_storage_trust_file": null,
  "recovery_append_timeout_ms": 5000,
  "cloud_storage_api_endpoint_port": 443,
  "cloud_storage_upload_loop_initial_backoff_ms": 100,
  "cloud_storage_credentials_source": "config_file",
  "cloud_storage_api_endpoint": null,
  "cloud_storage_secret_key": null,
  "kafka_quota_balancer_min_shard_throughput_bps": 256,
  "cloud_storage_enable_remote_write": false,
  "kafka_enable_describe_log_dirs_remote_storage": true,
  "default_window_sec": 1000,
  "kafka_rpc_server_stream_recv_buf": null,
  "kafka_qdc_min_depth": 1,
  "compacted_log_segment_size": 67108864,
  "kafka_client_group_fetch_byte_rate_quota": [],
  "kafka_client_group_byte_rate_quota": [],
  "kafka_connections_max_per_ip": null,
  "transaction_coordinator_partitions": 50,
  "members_backend_retry_ms": 5000,
  "compaction_ctrl_max_shares": 1000,
  "election_timeout_ms": 1500,
  "auto_create_topics_enabled": false,
  "compaction_ctrl_min_shares": 10,
  "rps_limit_acls_and_users_operations": 1000,
  "kafka_noproduce_topics": [
    "__audit"
  ],
  "cloud_storage_disable_tls": false,
  "topic_partitions_reserve_shard0": 2,
  "kafka_batch_max_bytes": 1048576,
  "cloud_storage_hydrated_chunks_per_segment_ratio": 0.7,
  "compaction_ctrl_update_interval_ms": 30000,
  "kafka_request_max_bytes": 104857600,
  "node_management_operation_timeout_ms": 5000,
  "cloud_storage_segment_size_min": null,
  "rpc_server_tcp_send_buf": null,
  "enable_transactions": true,
  "kafka_enable_partition_reassignment": true,
  "raft_heartbeat_disconnect_failures": 3,
  "tx_timeout_delay_ms": 1000,
  "compaction_ctrl_p_coeff": -12.5,
  "kafka_mtls_principal_mapping_rules": null,
  "enable_schema_id_validation": "none",
  "leader_balancer_idle_timeout": 120000,
  "kafka_enable_authorization": true,
  "sasl_kerberos_principal": "redpanda",
  "cloud_storage_azure_storage_account": null,
  "sasl_kerberos_keytab": "/var/lib/redpanda/redpanda.keytab",
  "find_coordinator_timeout_ms": 2000,
  "cloud_storage_enable_remote_read": false,
  "rps_limit_move_operations": 1000,
  "kafka_qdc_window_size_ms": 1500,
  "cloud_storage_upload_loop_max_backoff_ms": 10000,
  "kafka_nodelete_topics": [
    "__audit",
    "__consumer_offsets",
    "_schemas"
  ],
  "node_status_reconnect_max_backoff_ms": 15000,
  "memory_enable_memory_sampling": true,
  "sasl_kerberos_config": "/etc/krb5.conf",
  "leader_balancer_transfer_limit_per_shard": 512,
  "cloud_storage_housekeeping_interval_ms": 300000,
  "id_allocator_log_capacity": 100,
  "controller_log_accummulation_rps_capacity_configuration_operations": null,
  "retention_local_strict": false,
  "tm_sync_timeout_ms": 10000,
  "compaction_ctrl_d_coeff": 0.2,
  "aggregate_metrics": false,
  "storage_ignore_cstore_hints": false,
  "kafka_schema_id_validation_cache_capacity": 128,
  "compaction_ctrl_i_coeff": 0,
  "quota_manager_gc_sec": 30000,
  "storage_ignore_timestamps_in_future_sec": null,
  "cpu_profiler_sample_period_ms": 100,
  "group_initial_rebalance_delay": 3000,
  "storage_compaction_index_memory": 134217728,
  "disk_reservation_percent": 25,
  "storage_max_concurrent_replay": 1024,
  "kafka_rpc_server_tcp_send_buf": null,
  "cloud_storage_upload_ctrl_update_interval_ms": 60000,
  "alter_topic_cfg_timeout_ms": 5000,
  "segment_fallocation_step": 33554432,
  "cloud_storage_graceful_transfer_timeout_ms": 5000,
  "sasl_mechanisms": [
    "SCRAM"
  ],
  "cloud_storage_manifest_upload_timeout_ms": 10000,
  "partition_autobalancing_mode": "node_add",
  "storage_reserve_min_segments": 2,
  "storage_read_readahead_count": 10,
  "cloud_storage_disable_read_replica_loop_for_tests": false,
  "kafka_max_bytes_per_fetch": 67108864,
  "enable_sasl": true,
  "kvstore_max_segment_size": 16777216,
  "cloud_storage_roles_operation_timeout_ms": 30000,
  "retention_bytes": null,
  "release_cache_on_segment_roll": false,
  "memory_abort_on_alloc_failure": true,
  "kvstore_flush_interval": 10,
  "enable_pid_file": true,
  "compaction_ctrl_backlog_size": null,
  "append_chunk_size": 16384,
  "reclaim_batch_cache_min_free": 67108864,
  "reclaim_stable_window": 10000,
  "fetch_session_eviction_timeout_ms": 60000,
  "enable_leader_balancer": true,
  "raft_transfer_leader_recovery_timeout_ms": 10000,
  "reclaim_max_size": 4194304,
  "metrics_reporter_url": "https://m.rp.vectorized.io/v2",
  "reclaim_min_size": 131072,
  "max_kafka_throttle_delay_ms": 30000,
  "raft_smp_max_non_local_requests": null,
  "rps_limit_configuration_operations": 1000,
  "raft_recovery_throttle_disable_dynamic_mode": false,
  "log_segment_ms_max": 31536000000,
  "cloud_storage_max_connections": 20,
  "raft_replicate_batch_window_size": 1048576,
  "replicate_append_timeout_ms": 3000,
  "reclaim_growth_window": 3000,
  "cloud_storage_max_segments_pending_deletion_per_partition": 5000,
  "kafka_group_recovery_timeout_ms": 30000,
  "partition_autobalancing_tick_moves_drop_threshold": 0.2,
  "default_topic_partitions": 1,
  "features_auto_enable": true,
  "wait_for_leader_timeout_ms": 5000,
  "cloud_storage_upload_ctrl_max_shares": 1000,
  "log_segment_size_max": 268435456,
  "cluster_id": "6be9ef63-fbc5-4192-8a5f-b2f6edc24b95",
  "transaction_coordinator_delete_retention_ms": 604800000,
  "partition_autobalancing_tick_interval_ms": 30000,
  "space_management_max_segment_concurrency": 10,
  "storage_read_buffer_size": 131072,
  "usage_disk_persistance_interval_sec": 300,
  "metadata_status_wait_timeout_ms": 2000,
  "kafka_qdc_max_latency_ms": 80,
  "kafka_qdc_window_count": 12,
  "transaction_coordinator_cleanup_policy": "delete",
  "group_max_session_timeout_ms": 300000,
  "kafka_connections_max": null,
  "cloud_storage_enabled": false,
  "abort_timed_out_transactions_interval_ms": 10000,
  "default_topic_replications": 1,
  "join_retry_timeout_ms": 5000,
  "kafka_connection_rate_limit": 1000,
  "log_compaction_interval_ms": 10000,
  "cloud_storage_bucket": null,
  "cloud_storage_spillover_manifest_max_segments": null,
  "max_transactions_per_coordinator": 18446744073709552000,
  "cloud_storage_metadata_sync_timeout_ms": 10000,
  "max_concurrent_producer_ids": 18446744073709552000,
  "cloud_storage_access_key": null,
  "kafka_tcp_keepalive_probes": 3,
  "partition_autobalancing_max_disk_usage_percent": 80,
  "cloud_storage_cluster_metadata_upload_interval_ms": 60000,
  "transactional_id_expiration_ms": 604800000,
  "legacy_permit_unsafe_log_operation": true,
  "log_segment_size": 134217728,
  "rpc_server_listen_backlog": null,
  "space_management_enable": true,
  "group_topic_partitions": 16,
  "disable_public_metrics": false,
  "kafka_tcp_keepalive_timeout": 120,
  "fetch_max_bytes": 57671680,
  "usage_window_width_interval_sec": 3600,
  "fetch_reads_debounce_timeout": 1,
  "storage_space_alert_free_threshold_bytes": 0,
  "log_segment_ms": 1209600000,
  "log_compression_type": "producer",
  "log_message_timestamp_type": "CreateTime",
  "log_cleanup_policy": "delete",
  "abort_index_segment_size": 50000,
  "kafka_throughput_limit_node_in_bps": null,
  "tx_log_stats_interval_s": 10,
  "create_topic_timeout_ms": 2000,
  "max_compacted_log_segment_size": 536870912,
  "rm_sync_timeout_ms": 10000,
  "controller_backend_housekeeping_interval_ms": 1000,
  "kafka_connection_rate_limit_overrides": [],
  "raft_max_concurrent_append_requests_per_follower": 16,
  "metadata_dissemination_retries": 30,
  "node_status_interval": 100,
  "default_num_windows": 10,
  "target_fetch_quota_byte_rate": null,
  "metadata_dissemination_retry_delay_ms": 320,
  "legacy_group_offset_retention_enabled": false,
  "group_offset_retention_check_ms": 600000,
  "group_offset_retention_sec": 604800,
  "log_segment_size_min": 16777216,
  "group_new_member_join_timeout": 30000,
  "group_min_session_timeout_ms": 6000,
  "transaction_coordinator_log_segment_size": 1073741824,
  "cloud_storage_cache_size": 0,
  "log_segment_size_jitter_percent": 5,
  "cloud_storage_initial_backoff_ms": 100,
  "rpc_server_tcp_recv_buf": null,
  "superusers": [
    "test-user"
  ],
  "raft_recovery_default_read_size": 524288,
  "kafka_qdc_depth_alpha": 0.8,
  "kafka_admin_topic_api_rate": null,
  "raft_io_timeout_ms": 10000,
  "cloud_storage_segment_max_upload_interval_sec": null,
  "metadata_dissemination_interval_ms": 3000,
  "kafka_connections_max_overrides": [],
  "usage_num_windows": 24,
  "raft_timeout_now_timeout_ms": 1000,
  "id_allocator_batch_size": 1000,
  "enable_usage": false,
  "sasl_kerberos_principal_mapping": [
    "DEFAULT"
  ],
  "raft_heartbeat_timeout_ms": 3000,
  "controller_snapshot_max_age_sec": 60,
  "rps_limit_node_management_operations": 1000,
  "raft_heartbeat_interval_ms": 150,
  "cloud_storage_segment_upload_timeout_ms": 30000,
  "use_fetch_scheduler_group": true,
  "log_segment_ms_min": 60000,
  "admin_api_require_auth": false,
  "topic_fds_per_partition": 5,
  "enable_idempotence": true,
  "disable_batch_cache": false,
  "cloud_storage_readreplica_manifest_sync_timeout_ms": 30000,
  "kafka_throughput_limit_node_out_bps": null,
  "segment_appender_flush_timeout_ms": 1000,
  "topic_memory_per_partition": 1048576,
  "target_quota_byte_rate": 2147483648,
  "raft_max_recovery_memory": null,
  "kafka_qdc_depth_update_ms": 7000,
  "delete_retention_ms": 604800000,
  "space_management_max_log_concurrency": 20,
  "topic_partitions_per_shard": 1000,
  "controller_log_accummulation_rps_capacity_node_management_operations": null,
  "cloud_storage_azure_adls_endpoint": null,
  "disable_metrics": false,
  "kafka_rpc_server_tcp_recv_buf": null,
  "raft_learner_recovery_rate": 104857600,
  "kafka_tcp_keepalive_probe_interval_seconds": 60,
  "cloud_storage_region": null
}

But those users are not being added in the rpk config as superusers, nor did it happen dynamically in runtime. My cluster was restarted and right now calling rpk cluster config get superusers returns [].

In the above scenario the rpk returns correct one entry in superuser list:

$ kubectl exec -ti redpanda-0 -- rpk cluster config get superusers -X admin.tls.ca=/etc/tls/certs/default/ca.crt -X admin.hosts=redpanda-0.redpanda.redpanda.svc.cluster.local.:9644,redpanda-1.redpanda.redpanda.svc.cluster.local.:9644,redpanda-2.redpanda.redpanda.svc.cluster.local.:9644 -X admin.tls.enabled=true
Defaulted container "redpanda" out of: redpanda, config-watcher, tuning (init), redpanda-configurator (init)
- test-user

I'm not sure what you mean by restarted. Did you delete Pods? Scale helm release to 0 replicas?

rauanmayemir commented 1 year ago

@RafalKorepta I had a downtime with my cluster and pods got deleted and recreated at some point. I've checked the logs of config-watcher and it ends at RUNNING: Monitoring and Updating SASL users. That's it, stderr is finished after that.

rauanmayemir commented 1 year ago

I'll try to bump the chart version and see if it gets fixed.

RafalKorepta commented 1 year ago

I had a downtime with my cluster and pods got deleted and recreated at some point.

I'm not sure what kind of Storage Class you are using. If you are using ephemeral storage class, then your cluster started from scratch. The config-watcher should configure superuser cluster configuration property.

Can you check what rpk cluster config get superusers with all optional flags returns?

rauanmayemir commented 1 year ago

@RafalKorepta No, what I meant is the cluster with topics is there and persistent. Just that config is reset and without explicit setting it won't work.

rauanmayemir commented 1 year ago

Bumping to the latest chart version didn't help. I don't have the permission to unclose the issue.

rauanmayemir commented 1 year ago

Can you check what rpk cluster config get superusers with all optional flags returns?

Like I said, it's [].

RafalKorepta commented 1 year ago

@rauanmayemir it's really hard for me to reproduce your problem. I provided you my steps and it works as described in our docs. You send snippet of logs in slack thread that cut at the moment where superusers should be set by config-watcher. Please provide reproducible scenario that I can reliably run.

rauanmayemir commented 1 year ago

@RafalKorepta Yeah, I get it. Looking for a proper repro. I thought it’s about helm, but looks like something in my config breaking config-watcher.

joejulian commented 1 year ago

If you can figure out what that is, it would still be nice to be able to break config-watcher ourselves to see if we can make it at least output something useful for users.

RafalKorepta commented 1 year ago

@rauanmayemir I found few problems with updating redpanda helm chart that affected console and Redpanda cluster configuration. Maybe fix proposed in https://github.com/redpanda-data/helm-charts/pull/759 will solve your problems too.

RafalKorepta commented 1 year ago

@rauanmayemir I will close the issue on Monday if you not find any issues related to superusers.

Thanks for the initial report.

rauanmayemir commented 1 year ago

@RafalKorepta sorry for the delay, just got some cycles to debug this and finally found the issue. redpanda-data/redpanda#13880