k8ssandra / k8ssandra-operator

The Kubernetes operator for K8ssandra
https://k8ssandra.io/
Apache License 2.0
174 stars 79 forks source link

K8SSAND-1724 ⁃ cassandra.config. jvmOptions.additionalOptions are not applied to server-config-init's environment variables #647

Closed Miles-Garnsey closed 2 years ago

Miles-Garnsey commented 2 years ago

What happened?

Attempting to add JVM startup options via cassandra.config.jvmOptions.additionalOptions doesn't appear to work. The environment variables do not appear in the server-config-init container, nor is Cassandra started with the options.

Did you expect to see something different?

Yes, Cassandra should start with the JVM options.

server-config-init starts with the following environment (does not contain expected variables):

CONFIG_FILE_DATA:{"cassandra-env-sh":{"additional-jvm-opts":["-Dcassandra.system_distributed_replication=dc1:1","-Dcom.sun.management.jmxremote.authenticate=false","-javaagent:/opt/metrics-collector/lib/datastax-mcac-agent.jar","-javaagent:/opt/management-api/datastax-mgmtapi-agent-0.1.0-SNAPSHOT.jar","-javaagent:/opt/cdc_agent/cdc-agent.jar=pulsarServiceUrl=pulsar://pulsar-proxy.pulsar.svc.cluster.local:6650"]},"cassandra-yaml":{"authenticator":"AllowAllAuthenticator","authorizer":"AllowAllAuthorizer","cdc_enabled":true,"num_tokens":16,"role_manager":"CassandraRoleManager"},"cluster-info":{"name":"test","seeds":"test-seed-service,test-dc1-additional-seed-service"},"datacenter-info":{"graph-enabled":0,"name":"dc1","solr-enabled":0,"spark-enabled":0}}

Cassandra starts as follows (according to ps aux | grep cassandra):

cassand+     205 33.0 27.0 5213204 3323384 ?     Sl   06:29   0:51 /opt/java/openjdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:+AlwaysPreTouch -Dcassandra.disable_auth_caches_remote_configuration=false -Dcassandra.force_default_indexing_page_size=false -Dcassandra.join_ring=true -Dcassandra.load_ring_state=true -Dcassandra.write_survey=false -XX:+DebugNonSafepoints -ea -XX:GuaranteedSafepointInterval=300000 -XX:+HeapDumpOnOutOfMemoryError -Dio.netty.eventLoop.maxPendingTasks=65536 -Djava.net.preferIPv4Stack=true -Djdk.nio.maxCachedBufferSize=1048576 -Dsun.nio.PageAlignDirectMemory=true -Xss256k -XX:+PerfDisableSharedMem -XX:+PreserveFramePointer -Dcassandra.printHeapHistogramOnOutOfMemoryError=false -XX:+ResizeTLAB -XX:-RestrictContended -XX:StringTableSize=1000003 -XX:-UseBiasedLocking -XX:+UseNUMA -XX:+UseThreadPriorities -XX:+UseTLAB -Dcom.sun.management.jmxremote.authenticate=false -Dcassandra.jmx.local.port=7199 -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=500 -XX:+UseG1GC -XX:+ParallelRefProcEnabled -Djdk.attach.allowAttachSelf=true --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-opens java.base/jdk.internal.module=ALL-UNNAMED --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-exports java.base/jdk.internal.perf=ALL-UNNAMED --add-exports java.base/sun.nio.ch=ALL-UNNAMED --add-exports java.management.rmi/com.sun.jmx.remote.internal.rmi=ALL-UNNAMED --add-exports java.rmi/sun.rmi.registry=ALL-UNNAMED --add-exports java.rmi/sun.rmi.server=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true -Xlog:gc=info,heap*=trace,age*=debug,safepoint=info,promotion*=trace:file=/opt/cassandra/logs/gc.log:time,uptime,pid,tid,level:filecount=10,filesize=10485760 -Xms2994M -Xmx2994M -XX:CompileCommandFile=/opt/cassandra/conf/hotspot_compiler -javaagent:/opt/cassandra/lib/jamm-0.3.2.jar -Dcassandra.jmx.local.port=7199 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password -Djava.library.path=/opt/cassandra/lib/sigar-bin -Dcassandra.system_distributed_replication=dc1:1 -Dcom.sun.management.jmxremote.authenticate=false -javaagent:/opt/metrics-collector/lib/datastax-mcac-agent.jar -javaagent:/opt/management-api/datastax-mgmtapi-agent-0.1.0-SNAPSHOT.jar -javaagent:/opt/cdc_agent/cdc-agent.jar=pulsarServiceUrl=pulsar://pulsar-proxy.pulsar.svc.cluster.local:6650 -Dcassandra.libjemalloc=/usr/local/lib/libjemalloc.so -XX:OnOutOfMemoryError=kill -9 %p -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/opt/cassandra/logs -Dcassandra.storagedir=/opt/cassandra/data -cp /opt/cassandra/conf:/opt/cassandra/lib/HdrHistogram-2.1.9.jar:/opt/cassandra/lib/ST4-4.0.8.jar:/opt/cassandra/lib/airline-0.8.jar:/opt/cassandra/lib/antlr-runtime-3.5.2.jar:/opt/cassandra/lib/apache-cassandra-4.0.4.jar:/opt/cassandra/lib/asm-7.1.jar:/opt/cassandra/lib/caffeine-2.5.6.jar:/opt/cassandra/lib/cassandra-driver-core-3.11.0-shaded.jar:/opt/cassandra/lib/chronicle-bytes-2.20.111.jar:/opt/cassandra/lib/chronicle-core-2.20.126.jar:/opt/cassandra/lib/chronicle-queue-5.20.123.jar:/opt/cassandra/lib/chronicle-threads-2.20.111.jar:/opt/cassandra/lib/chronicle-wire-2.20.117.jar:/opt/cassandra/lib/commons-cli-1.1.jar:/opt/cassandra/lib/commons-codec-1.9.jar:/opt/cassandra/lib/commons-lang3-3.11.jar:/opt/cassandra/lib/commons-math3-3.2.jar:/opt/cassandra/lib/concurrent-trees-2.4.0.jar:/opt/cassandra/lib/ecj-4.6.1.jar:/opt/cassandra/lib/guava-27.0-jre.jar:/opt/cassandra/lib/high-scale-lib-1.0.6.jar:/opt/cassandra/lib/hppc-0.8.1.jar:/opt/cassandra/lib/j2objc-annotations-1.3.jar:/opt/cassandra/lib/jackson-annotations-2.13.2.jar:/opt/cassandra/lib/jackson-core-2.13.2.jar:/opt/cassandra/lib/jackson-databind-2.13.2.2.jar:/opt/cassandra/lib/jamm-0.3.2.jar:/opt/cassandra/lib/java-cup-runtime-11b-20160615.jar:/opt/cassandra/lib/javax.inject-1.jar:/opt/cassandra/lib/jbcrypt-0.4.jar:/opt/cassandra/lib/jcl-over-slf4j-1.7.25.jar:/opt/cassandra/lib/jcommander-1.30.jar:/opt/cassandra/lib/jctools-core-3.1.0.jar:/opt/cassandra/lib/jflex-1.8.2.jar:/opt/cassandra/lib/jna-5.6.0.jar:/opt/cassandra/lib/json-simple-1.1.jar:/opt/cassandra/lib/jvm-attach-api-1.5.jar:/opt/cassandra/lib/log4j-over-slf4j-1.7.25.jar:/opt/cassandra/lib/logback-classic-1.2.9.jar:/opt/cassandra/lib/logback-core-1.2.9.jar:/opt/cassandra/lib/lz4-java-1.8.0.jar:/opt/cassandra/lib/metrics-core-3.1.5.jar:/opt/cassandra/lib/metrics-jvm-3.1.5.jar:/opt/cassandra/lib/metrics-logback-3.1.5.jar:/opt/cassandra/lib/mxdump-0.14.jar:/opt/cassandra/lib/netty-all-4.1.58.Final.jar:/opt/cassandra/lib/netty-tcnative-boringssl-static-2.0.36.Final.jar:/opt/cassandra/lib/ohc-core-0.5.1.jar:/opt/cassandra/lib/ohc-core-j8-0.5.1.jar:/opt/cassandra/lib/psjava-0.1.19.jar:/opt/cassandra/lib/reporter-config-base-3.0.3.jar:/opt/cassandra/lib/reporter-config3-3.0.3.jar:/opt/cassandra/lib/sigar-1.6.4.jar:/opt/cassandra/lib/sjk-cli-0.14.jar:/opt/cassandra/lib/sjk-core-0.14.jar:/opt/cassandra/lib/sjk-json-0.14.jar:/opt/cassandra/lib/sjk-stacktrace-0.14.jar:/opt/cassandra/lib/slf4j-api-1.7.25.jar:/opt/cassandra/lib/snakeyaml-1.26.jar:/opt/cassandra/lib/snappy-java-1.1.2.6.jar:/opt/cassandra/lib/snowball-stemmer-1.3.0.581.1.jar:/opt/cassandra/lib/stream-2.5.2.jar:/opt/cassandra/lib/zstd-jni-1.5.0-4.jar:/opt/cassandra/lib/jsr223/*/*.jar: -Dcassandra.server_process -Dcassandra.skip_default_role_setup=true -Ddb.unix_socket_file=/tmp/cassandra.sock org.apache.cassandra.service.CassandraDaemon

How to reproduce it (as minimally and precisely as possible):

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test
spec:
  auth: false
  cassandra:
    telemetry:
      prometheus:
        enabled: true
    serverVersion: "4.0.4"
    datacenters:
      - metadata:
          name: dc1
        size: 1
        cdc:
          pulsarServiceUrl: pulsar://pulsar-proxy.pulsar.svc.cluster.local:6650
        storageConfig:
          cassandraDataVolumeClaimSpec:
            storageClassName: standard
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 5Gi
        config:
          jmx_connection_type: remote-no-auth
          additionalOptions:
          - -Dcassandra.jmx.remote.port=7199
          - -Djava.rmi.server.hostname=127.0.0.1
          - -Dcom.sun.management.jmxremote.rmi.port=7199

┆Issue is synchronized with this Jira Task by Unito ┆friendlyId: K8SSAND-1724 ┆priority: Medium

adutra commented 2 years ago

Your YAML is not correct, you missed the jvmOptions level below config:

        config:
          jvmOptions:            
            jmx_connection_type: remote-no-auth
            additionalOptions:
              - -Dcassandra.jmx.remote.port=7199
              - -Djava.rmi.server.hostname=127.0.0.1
              - -Dcom.sun.management.jmxremote.rmi.port=7199

I guess this didn't trigger an error because CassandraDatacenterTemplate is annotated as follows:

// +kubebuilder:pruning:PreserveUnknownFields
type CassandraDatacenterTemplate struct {

I think we need to assess whether we still need PreserveUnknownFields or not.

Miles-Garnsey commented 2 years ago

I think you're right for that original manifest I posted! When I add the additionalOptions level to the yaml it does all start working.

I don't think PreserveUnknownFields is still required, could we try removing it in the interests of usability?

Here's another issue (and I think this is the one I hit at first). If I define config at both cluster and DC level, it seems like the DC level config overwrites the cluster level config without any merge behaviour?

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test
spec:
  auth: false
  cassandra:
    config:
      jvmOptions:
        jmx_connection_type: remote-no-auth
        additionalOptions:
        - -Dcassandra.jmx.remote.port=7199
        - -Djava.rmi.server.hostname=127.0.0.1
        - -Dcom.sun.management.jmxremote.rmi.port=7199
    telemetry:
      prometheus:
        enabled: true
    serverVersion: "4.0.4"
    datacenters:
      - metadata:
          name: dc1
        size: 1
        cdc:
          pulsarServiceUrl: pulsar://pulsar-proxy.pulsar.svc.cluster.local:6650
        storageConfig:
          cassandraDataVolumeClaimSpec:
            storageClassName: standard
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 5Gi
        config:
          jvmOptions:
            heapSize: 512Mi

This manifest also results in the env variables not appearing (even though jvmOptions is added here.)

adutra commented 2 years ago

I don't think PreserveUnknownFields is still required, could we try removing it in the interests of usability?

Agreed, I think we can remove it.

it seems like the DC level config overwrites the cluster level config without any merge behaviour?

Yes that comes from this code: https://github.com/k8ssandra/k8ssandra-operator/blob/bbcfe3450272920014a242750bc31ec9863c333d/pkg/cassandra/datacenter.go#L334-L338.

This manifest also results in the env variables not appearing

Sorry what env variables? I don't see any specified in the manifest.

Miles-Garnsey commented 2 years ago

Sorry what env variables? I don't see any specified in the manifest.

Sorry, I meant JVM startup options (I was raising too many related tickets last week).

Miles-Garnsey commented 2 years ago

I am closing this in favour of the following two tickets to address PreserveUnknownFields as well as address the DC/Cluster level override behaviour.