Open mindaugaszilionis opened 1 year ago
Hi, thanks for the report.
Are you running the agent or standalone version?
When you say "loading", where exactly are you seeing that, prometheus?
Are you able to query the metrics api directly on the node and get any results? E.g: http://localhost:9500/metrics
Cheers
Hi, i use agent version and add it into classpath by JVM_OPTS="$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/cassandra-exporter-agent-0.9.12.jar"
when i try to load metrics, it is taking ages and do not respond (after some minutes i just cancel)
[root@***]# wget localhost:9500/metrics --2023-03-08 14:18:56-- http://localhost:9500/metrics Resolving localhost (localhost)... ::1, 127.0.0.1 Connecting to localhost (localhost)|::1|:9500... failed: Connection refused. Connecting to localhost (localhost)|127.0.0.1|:9500... connected. HTTP request sent, awaiting response...
process looks like cassand+ 1795408 1 99 Feb28 ? 12-04:37:15 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.332.b09-2.el8_6.x86_64/jre/bin/java -ea -da:net.openhft... -XX:+UseThreadPriorities -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+UseNUMA -XX:+PerfDisableSharedMem -Djava.net.preferIPv4Stack=true -Xms64G -Xmx64G -XX:ThreadPriorityPolicy=42 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSWaitDuration=10000 -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways -XX:+CMSClassUnloadingEnabled -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -Xloggc:/var/log/cassandra/gc.log -Xmn2048M -XX:+UseCondCardMark -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler -javaagent:/usr/share/cassandra/lib/jamm-0.3.2.jar -Dcassandra.jmx.remote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password -Dcom.sun.management.jmxremote.access.file=/etc/cassandra/jmxremote.access -Djava.library.path=/usr/share/cassandra/lib/sigar-bin -javaagent:/usr/share/cassandra/lib/cassandra-exporter-agent-0.9.12.jar -XX:OnOutOfMemoryError=kill -9 %p -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp /etc/cassandra/conf:/usr/share/cassandra/lib/airline-0.8.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/asm-7.1.jar:/usr/share/cassandra/lib/caffeine-2.5.6.jar:/usr/share/cassandra/lib/cassandra-driver-core-3.11.0-shaded.jar:/usr/share/cassandra/lib/cassandra-exporter-agent-0.9.12.jar:/usr/share/cassandra/lib/chronicle-bytes-2.20.111.jar:/usr/share/cassandra/lib/chronicle-core-2.20.126.jar:/usr/share/cassandra/lib/chronicle-queue-5.20.123.jar:/usr/share/cassandra/lib/chronicle-threads-2.20.111.jar:/usr/share/cassandra/lib/chronicle-wire-2.20.117.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.9.jar:/usr/share/cassandra/lib/commons-lang3-3.11.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/concurrent-trees-2.4.0.jar:/usr/share/cassandra/lib/ecj-4.6.1.jar:/usr/share/cassandra/lib/guava-27.0-jre.jar:/usr/share/cassandra/lib/HdrHistogram-2.1.9.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/hppc-0.8.1.jar:/usr/share/cassandra/lib/j2objc-annotations-1.3.jar:/usr/share/cassandra/lib/jackson-annotations-2.13.2.jar:/usr/share/cassandra/lib/jackson-core-2.13.2.jar:/usr/share/cassandra/lib/jackson-databind-2.13.2.2.jar:/usr/share/cassandra/lib/jamm-0.3.2.jar:/usr/share/cassandra/lib/java-cup-runtime-11b-20160615.jar:/usr/share/cassandra/lib/javax.inject-1.jar:/usr/share/cassandra/lib/jbcrypt-0.4.jar:/usr/share/cassandra/lib/jcl-over-slf4j-1.7.25.jar:/usr/share/cassandra/lib/jcommander-1.30.jar:/usr/share/cassandra/lib/jctools-core-3.1.0.jar:/usr/share/cassandra/lib/jflex-1.8.2.jar:/usr/share/cassandra/lib/jna-5.6.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/jvm-attach-api-1.5.jar:/usr/share/cassandra/lib/log4j-over-slf4j-1.7.25.jar:/usr/share/cassandra/lib/logback-classic-1.2.9.jar:/usr/share/cassandra/lib/logback-core-1.2.9.jar:/usr/share/cassandra/lib/lz4-java-1.8.0.jar:/usr/share/cassandra/lib/metrics-core-3.1.5.jar:/usr/share/cassandra/lib/metrics-jvm-3.1.5.jar:/usr/share/cassandra/lib/metrics-logback-3.1.5.jar:/usr/share/cassandra/lib/mxdump-0.14.jar:/usr/share/cassandra/lib/netty-all-4.1.58.Final.jar:/usr/share/cassandra/lib/netty-tcnative-boringssl-static-2.0.36.Final.jar:/usr/share/cassandra/lib/ohc-core-0.5.1.jar:/usr/share/cassandra/lib/ohc-core-j8-0.5.1.jar:/usr/share/cassandra/lib/psjava-0.1.19.jar:/usr/share/cassandra/lib/reporter-config3-3.0.3.jar:/usr/share/cassandra/lib/reporter-config-base-3.0.3.jar:/usr/share/cassandra/lib/sigar-1.6.4.jar:/usr/share/cassandra/lib/sjk-cli-0.14.jar:/usr/share/cassandra/lib/sjk-core-0.14.jar:/usr/share/cassandra/lib/sjk-json-0.14.jar:/usr/share/cassandra/lib/sjk-stacktrace-0.14.jar:/usr/share/cassandra/lib/slf4j-api-1.7.25.jar:/usr/share/cassandra/lib/snakeyaml-1.26.jar:/usr/share/cassandra/lib/snappy-java-1.1.2.6.jar:/usr/share/cassandra/lib/snowball-stemmer-1.3.0.581.1.jar:/usr/share/cassandra/lib/ST4-4.0.8.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/zstd-jni-1.5.0-4.jar:/usr/share/cassandra/apache-cassandra-4.0.5.jar:/usr/share/cassandra/fqltool.jar:/usr/share/cassandra/stress.jar: org.apache.cassandra.service.CassandraDaemon
@mindaugaszilionis\@johndelcastillo Were you able to find a solution for this? I am facing the same issue currently in my environment.
The problem here is that versions are not working with 4.0.x. You get a stacktrace in the system.log and the metrics will load forever. I created a quick patch to make it work in #114.
actually same problem with casandra 4.1.3 in casssandra system logs there is error: WARN [prometheus-netty-pool-0] 2023-10-06 14:51:27,373 DefaultChannelPipeline.java:1152 - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. java.nio.BufferOverflowException: null at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:222) at com.zegelin.prometheus.exposition.NioExpositionSink.writeBytes(NioExpositionSink.java:27) at com.zegelin.prometheus.exposition.text.TextFormatMetricFamilyWriter$MetricVisitor.writeLabels(TextFormatMetricFamilyWriter.java:111) at com.zegelin.prometheus.exposition.text.TextFormatMetricFamilyWriter$MetricVisitor.writeLabelSets(TextFormatMetricFamilyWriter.java:129) at com.zegelin.prometheus.exposition.text.TextFormatMetricFamilyWriter$MetricVisitor.writeMetric(TextFormatMetricFamilyWriter.java:141) at com.zegelin.prometheus.exposition.text.TextFormatMetricFamilyWriter$MetricVisitor.lambda$visit$4(TextFormatMetricFamilyWriter.java:181) at com.zegelin.prometheus.exposition.text.TextFormatMetricFamilyWriter$MetricVisitor.lambda$metricW
some changes appeared, when i set rpc_address: 0.0.0.0 in cassandra.yaml, i get [root@l160c-cass-c6n1 ~]# wget localhost:9500/metrics --2023-10-06 17:07:10-- http://localhost:9500/metrics Resolving localhost (localhost)... ::1, 127.0.0.1 Connecting to localhost (localhost)|::1|:9500... failed: Connection refused. Connecting to localhost (localhost)|127.0.0.1|:9500... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/plain] Saving to: ‘metrics.1’
metrics.1 [ <=> ] 0 --.-KB/s in 0s
2023-10-06 17:07:10 (0.00 B/s) - Read error at byte 0 (Success).Retrying.
--2023-10-06 17:07:11-- (try: 2) http://localhost:9500/metrics Connecting to localhost (localhost)|127.0.0.1|:9500... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/plain] Saving to: ‘metrics.1’
metrics.1 some progress, but metrics still not available
The release from edgelaborities has fixed the issue for me. (AFAIK because they just merged https://github.com/instaclustr/cassandra-exporter/pull/84) See also https://github.com/instaclustr/cassandra-exporter/issues/83.
as workaround, cassandra exporter started to work when table metrics were disabled JVM_OPTS="$JVM_OPTS -javaagent:/usr/share/cassandra/lib/cassandra-exporter-agent-0.9.14.jar=--table-metrics=NONE" maybe root couse might be related that we have lots of tables - hundrets of them.
Hi, tried replace 0.9.10 cassandra exporter agent with nwly released 0.9.12 exporter version. Dont see any difference - metrics still "loading" for ages and do not open - just like in previous version of exporter.