instaclustr / cassandra-exporter

Java agent for exporting Cassandra metrics to Prometheus
Apache License 2.0
73 stars 46 forks source link

cassandra exporter (0.9.12) still do not export metrics of cassandra 4.0.5 #109

Open mindaugaszilionis opened 1 year ago

mindaugaszilionis commented 1 year ago

Hi, tried replace 0.9.10 cassandra exporter agent with nwly released 0.9.12 exporter version. Dont see any difference - metrics still "loading" for ages and do not open - just like in previous version of exporter.

johndelcastillo commented 1 year ago

Hi, thanks for the report.

Are you running the agent or standalone version?

When you say "loading", where exactly are you seeing that, prometheus?

Are you able to query the metrics api directly on the node and get any results? E.g: http://localhost:9500/metrics

Cheers

mindaugaszilionis commented 1 year ago

Hi, i use agent version and add it into classpath by JVM_OPTS="$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/cassandra-exporter-agent-0.9.12.jar"

when i try to load metrics, it is taking ages and do not respond (after some minutes i just cancel)

[root@***]# wget localhost:9500/metrics --2023-03-08 14:18:56-- http://localhost:9500/metrics Resolving localhost (localhost)... ::1, 127.0.0.1 Connecting to localhost (localhost)|::1|:9500... failed: Connection refused. Connecting to localhost (localhost)|127.0.0.1|:9500... connected. HTTP request sent, awaiting response...

process looks like cassand+ 1795408 1 99 Feb28 ? 12-04:37:15 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-2.el8_6.x86_64/jre/bin/java -ea -da:net.openhft... -XX:+UseThreadPriorities -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+UseNUMA -XX:+PerfDisableSharedMem -Djava.net.preferIPv4Stack=true -Xms64G -Xmx64G -XX:ThreadPriorityPolicy=42 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSWaitDuration=10000 -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways -XX:+CMSClassUnloadingEnabled -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -Xloggc:/var/log/cassandra/gc.log -Xmn2048M -XX:+UseCondCardMark -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler -javaagent:/usr/share/cassandra/lib/jamm-0.3.2.jar -Dcassandra.jmx.remote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password -Dcom.sun.management.jmxremote.access.file=/etc/cassandra/jmxremote.access -Djava.library.path=/usr/share/cassandra/lib/sigar-bin -javaagent:/usr/share/cassandra/lib/cassandra-exporter-agent-0.9.12.jar -XX:OnOutOfMemoryError=kill -9 %p -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp /etc/cassandra/conf:/usr/share/cassandra/lib/airline-0.8.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/asm-7.1.jar:/usr/share/cassandra/lib/caffeine-2.5.6.jar:/usr/share/cassandra/lib/cassandra-driver-core-3.11.0-shaded.jar:/usr/share/cassandra/lib/cassandra-exporter-agent-0.9.12.jar:/usr/share/cassandra/lib/chronicle-bytes-2.20.111.jar:/usr/share/cassandra/lib/chronicle-core-2.20.126.jar:/usr/share/cassandra/lib/chronicle-queue-5.20.123.jar:/usr/share/cassandra/lib/chronicle-threads-2.20.111.jar:/usr/share/cassandra/lib/chronicle-wire-2.20.117.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.9.jar:/usr/share/cassandra/lib/commons-lang3-3.11.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/concurrent-trees-2.4.0.jar:/usr/share/cassandra/lib/ecj-4.6.1.jar:/usr/share/cassandra/lib/guava-27.0-jre.jar:/usr/share/cassandra/lib/HdrHistogram-2.1.9.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/hppc-0.8.1.jar:/usr/share/cassandra/lib/j2objc-annotations-1.3.jar:/usr/share/cassandra/lib/jackson-annotations-2.13.2.jar:/usr/share/cassandra/lib/jackson-core-2.13.2.jar:/usr/share/cassandra/lib/jackson-databind-2.13.2.2.jar:/usr/share/cassandra/lib/jamm-0.3.2.jar:/usr/share/cassandra/lib/java-cup-runtime-11b-20160615.jar:/usr/share/cassandra/lib/javax.inject-1.jar:/usr/share/cassandra/lib/jbcrypt-0.4.jar:/usr/share/cassandra/lib/jcl-over-slf4j-1.7.25.jar:/usr/share/cassandra/lib/jcommander-1.30.jar:/usr/share/cassandra/lib/jctools-core-3.1.0.jar:/usr/share/cassandra/lib/jflex-1.8.2.jar:/usr/share/cassandra/lib/jna-5.6.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/jvm-attach-api-1.5.jar:/usr/share/cassandra/lib/log4j-over-slf4j-1.7.25.jar:/usr/share/cassandra/lib/logback-classic-1.2.9.jar:/usr/share/cassandra/lib/logback-core-1.2.9.jar:/usr/share/cassandra/lib/lz4-java-1.8.0.jar:/usr/share/cassandra/lib/metrics-core-3.1.5.jar:/usr/share/cassandra/lib/metrics-jvm-3.1.5.jar:/usr/share/cassandra/lib/metrics-logback-3.1.5.jar:/usr/share/cassandra/lib/mxdump-0.14.jar:/usr/share/cassandra/lib/netty-all-4.1.58.Final.jar:/usr/share/cassandra/lib/netty-tcnative-boringssl-static-2.0.36.Final.jar:/usr/share/cassandra/lib/ohc-core-0.5.1.jar:/usr/share/cassandra/lib/ohc-core-j8-0.5.1.jar:/usr/share/cassandra/lib/psjava-0.1.19.jar:/usr/share/cassandra/lib/reporter-config3-3.0.3.jar:/usr/share/cassandra/lib/reporter-config-base-3.0.3.jar:/usr/share/cassandra/lib/sigar-1.6.4.jar:/usr/share/cassandra/lib/sjk-cli-0.14.jar:/usr/share/cassandra/lib/sjk-core-0.14.jar:/usr/share/cassandra/lib/sjk-json-0.14.jar:/usr/share/cassandra/lib/sjk-stacktrace-0.14.jar:/usr/share/cassandra/lib/slf4j-api-1.7.25.jar:/usr/share/cassandra/lib/snakeyaml-1.26.jar:/usr/share/cassandra/lib/snappy-java-1.1.2.6.jar:/usr/share/cassandra/lib/snowball-stemmer-1.3.0.581.1.jar:/usr/share/cassandra/lib/ST4-4.0.8.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/zstd-jni-1.5.0-4.jar:/usr/share/cassandra/apache-cassandra-4.0.5.jar:/usr/share/cassandra/fqltool.jar:/usr/share/cassandra/stress.jar: org.apache.cassandra.service.CassandraDaemon

st-gra commented 1 year ago

@mindaugaszilionis\@johndelcastillo Were you able to find a solution for this? I am facing the same issue currently in my environment.

itskarlsson commented 1 year ago

The problem here is that versions are not working with 4.0.x. You get a stacktrace in the system.log and the metrics will load forever. I created a quick patch to make it work in #114.

mindaugaszilionis commented 1 year ago

actually same problem with casandra 4.1.3 in casssandra system logs there is error: WARN [prometheus-netty-pool-0] 2023-10-06 14:51:27,373 DefaultChannelPipeline.java:1152 - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. java.nio.BufferOverflowException: null at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:222) at com.zegelin.prometheus.exposition.NioExpositionSink.writeBytes(NioExpositionSink.java:27) at com.zegelin.prometheus.exposition.text.TextFormatMetricFamilyWriter$MetricVisitor.writeLabels(TextFormatMetricFamilyWriter.java:111) at com.zegelin.prometheus.exposition.text.TextFormatMetricFamilyWriter$MetricVisitor.writeLabelSets(TextFormatMetricFamilyWriter.java:129) at com.zegelin.prometheus.exposition.text.TextFormatMetricFamilyWriter$MetricVisitor.writeMetric(TextFormatMetricFamilyWriter.java:141) at com.zegelin.prometheus.exposition.text.TextFormatMetricFamilyWriter$MetricVisitor.lambda$visit$4(TextFormatMetricFamilyWriter.java:181) at com.zegelin.prometheus.exposition.text.TextFormatMetricFamilyWriter$MetricVisitor.lambda$metricW

mindaugaszilionis commented 1 year ago

some changes appeared, when i set rpc_address: 0.0.0.0 in cassandra.yaml, i get [root@l160c-cass-c6n1 ~]# wget localhost:9500/metrics --2023-10-06 17:07:10-- http://localhost:9500/metrics Resolving localhost (localhost)... ::1, 127.0.0.1 Connecting to localhost (localhost)|::1|:9500... failed: Connection refused. Connecting to localhost (localhost)|127.0.0.1|:9500... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/plain] Saving to: ‘metrics.1’

metrics.1 [ <=> ] 0 --.-KB/s in 0s

2023-10-06 17:07:10 (0.00 B/s) - Read error at byte 0 (Success).Retrying.

--2023-10-06 17:07:11-- (try: 2) http://localhost:9500/metrics Connecting to localhost (localhost)|127.0.0.1|:9500... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/plain] Saving to: ‘metrics.1’

metrics.1 some progress, but metrics still not available

sonman commented 1 year ago

The release from edgelaborities has fixed the issue for me. (AFAIK because they just merged https://github.com/instaclustr/cassandra-exporter/pull/84) See also https://github.com/instaclustr/cassandra-exporter/issues/83.

mindaugaszilionis commented 1 day ago

as workaround, cassandra exporter started to work when table metrics were disabled JVM_OPTS="$JVM_OPTS -javaagent:/usr/share/cassandra/lib/cassandra-exporter-agent-0.9.14.jar=--table-metrics=NONE" maybe root couse might be related that we have lots of tables - hundrets of them.