jmxtrans / jmxtrans-agent

Java Agent based JMX metrics exporter.
MIT License
178 stars 110 forks source link

StatsDOutputWriter - Given data too big for the buffer size #98

Closed mindkir closed 7 years ago

mindkir commented 7 years ago

jmxtrans-agent-1.2.4.jar StatsDOutputWriter is indefinitelly failing with "java.nio.BufferOverflowException". It seems error was triggered by random failed DNS resolution, that caused some buffer (lim=91) to fill up.

2017-03-07 03:19:29.614 SEVERE [jmxtrans-agent-1] org.jmxtrans.agent.StatsDOutputWriter - Could not send stat java.nio.HeapByteBuffer[pos=0 lim=91 cap=1024] to host telegraf:8125
java.io.IOException: Target address not resolved
        at sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:425)
        at org.jmxtrans.agent.StatsDOutputWriter.flush(StatsDOutputWriter.java:157)
        at org.jmxtrans.agent.StatsDOutputWriter.doSend(StatsDOutputWriter.java:128)
        at org.jmxtrans.agent.StatsDOutputWriter.writeQueryResult(StatsDOutputWriter.java:118)
        at org.jmxtrans.agent.OutputWriterCircuitBreakerDecorator.writeQueryResult(OutputWriterCircuitBreakerDecorator.java:90)
        at org.jmxtrans.agent.Query.processAttributeValue(Query.java:274)
        at org.jmxtrans.agent.Query.collectAndExportAttribute(Query.java:239)
        at org.jmxtrans.agent.Query.collectAndExportForObjectName(Query.java:173)
        at org.jmxtrans.agent.Query.collectAndExport(Query.java:167)
        at org.jmxtrans.agent.TimeTrackingCollector.collectIfEnoughTimeHasPassed(TimeTrackingCollector.java:54)
        at org.jmxtrans.agent.JmxTransExporter.collectAndExport(JmxTransExporter.java:209)
        at org.jmxtrans.agent.JmxTransExporter$2.run(JmxTransExporter.java:131)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)

2017-03-07 03:19:29.614 WARNING [jmxtrans-agent-1] org.jmxtrans.agent.StatsDOutputWriter - Given data too big (98bytes) for the buffer size (91bytes), skip it: jmxtransagents.xx...

2017-03-07 03:19:29.615 WARNING [jmxtrans-agent-1] org.jmxtrans.agent.Query - Exception collecting java.lang:type=OperatingSystem#MaxFileDescriptorCount
java.nio.BufferOverflowException
        at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:183)
        at java.nio.ByteBuffer.put(ByteBuffer.java:832)
        at org.jmxtrans.agent.StatsDOutputWriter.doSend(StatsDOutputWriter.java:136)
        at org.jmxtrans.agent.StatsDOutputWriter.writeQueryResult(StatsDOutputWriter.java:118)
        at org.jmxtrans.agent.OutputWriterCircuitBreakerDecorator.writeQueryResult(OutputWriterCircuitBreakerDecorator.java:90)
        at org.jmxtrans.agent.Query.processAttributeValue(Query.java:274)
        at org.jmxtrans.agent.Query.collectAndExportAttribute(Query.java:239)
        at org.jmxtrans.agent.Query.collectAndExportForObjectName(Query.java:173)
        at org.jmxtrans.agent.Query.collectAndExport(Query.java:167)
        at org.jmxtrans.agent.TimeTrackingCollector.collectIfEnoughTimeHasPassed(TimeTrackingCollector.java:54)
        at org.jmxtrans.agent.JmxTransExporter.collectAndExport(JmxTransExporter.java:209)
        at org.jmxtrans.agent.JmxTransExporter$2.run(JmxTransExporter.java:131)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)

...

2017-03-07 08:26:33.896 WARNING [jmxtrans-agent-1] org.jmxtrans.agent.StatsDOutputWriter - Given data too big (104bytes) for the buffer size (91bytes), skip it: jmxtransagents.xx...

2017-03-07 08:26:33.896 WARNING [jmxtrans-agent-1] org.jmxtrans.agent.Query - Exception collecting java.lang:type=OperatingSystem#SystemCpuLoad
java.nio.BufferOverflowException
        at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:183)
        at java.nio.ByteBuffer.put(ByteBuffer.java:832)
        at org.jmxtrans.agent.StatsDOutputWriter.doSend(StatsDOutputWriter.java:136)
        at org.jmxtrans.agent.StatsDOutputWriter.writeQueryResult(StatsDOutputWriter.java:118)
        at org.jmxtrans.agent.OutputWriterCircuitBreakerDecorator.writeQueryResult(OutputWriterCircuitBreakerDecorator.java:90)
        at org.jmxtrans.agent.Query.processAttributeValue(Query.java:274)
        at org.jmxtrans.agent.Query.collectAndExportAttribute(Query.java:239)
        at org.jmxtrans.agent.Query.collectAndExportForObjectName(Query.java:173)
        at org.jmxtrans.agent.Query.collectAndExport(Query.java:167)
        at org.jmxtrans.agent.TimeTrackingCollector.collectIfEnoughTimeHasPassed(TimeTrackingCollector.java:54)
        at org.jmxtrans.agent.JmxTransExporter.collectAndExport(JmxTransExporter.java:209)
        at org.jmxtrans.agent.JmxTransExporter$2.run(JmxTransExporter.java:131)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
cyrille-leclerc commented 7 years ago

Can you please share the configuration of the outputwriter?

I suspect that you combined the host and port (telegraf:8125) in the same config element host instead of using the config element port for the port.

Then we have a bug/fragility in the StatsDOutputWriter that does not handle properly this configuration issue and causes BufferOverflowException.

    <outputWriter class="org.jmxtrans.agent.StatsDOutputWriter">
        <host>telegraf</host>
        <port>8125</port>
        ...
    </outputWriter>
mindkir commented 7 years ago

Here is output writer configuration:

<outputWriter class="org.jmxtrans.agent.StatsDOutputWriter">
    <host>telegraf</host>
    <port>8125</port>
    <metricName>jmxtransagents.${env.host.name:node}.${env.service.name:service}.${env.app.name:appname}.${envtype:poc}</metricName>
</outputWriter>
mindkir commented 7 years ago

Just to clarify this is not a configuration issue. Agent was running fine for few days without restart and started failing with BufferOverflowException after some IOExceptions happened.

cyrille-leclerc commented 7 years ago

@mindkir can you please test https://oss.sonatype.org/content/repositories/snapshots/org/jmxtrans/agent/jmxtrans-agent/1.2.6-SNAPSHOT/jmxtrans-agent-1.2.6-20170316.180546-3.jar ?

mindkir commented 7 years ago

@cyrille-leclerc confirmed that this version is recovering from IO exceptions and does not experience BufferOverflowException.

Test scenario:

  1. Started some JVM with jmxtrans-agent enabled

    $ ./jdk1.8.0_121/bin/java -javaagent:jmxtrans-agent-1.2.6-20170316.180546-3.jar=jmxtrans-agent.xml -Djmxtrans.agent.premain.delay=10 -Dsun.net.inetaddr.ttl=60 EchoServer 4444

  2. Added firewall rule to drop DNS packets and cause java.nio.channels.UnresolvedAddressException failures

    $ iptables -A OUTPUT -p udp -s $SOURCE_IP --sport 1024:65535 --dport 53 -m state --state NEW,ESTABLISHED -j DROP

  3. Checked for BufferOverflowException and monitored java.nio.HeapByteBuffer state. With new version it is stable and doesn't end up with BufferOverflowException.

    2017-03-17 12:46:29.03 SEVERE [jmxtrans-agent-1] org.jmxtrans.agent.StatsDOutputWriter - Could not send stat java.nio.HeapByteBuffer[pos=0 lim=978 cap=1024] to host telegraf:8125 java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:101) at sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:429) at org.jmxtrans.agent.StatsDOutputWriter.flush(StatsDOutputWriter.java:158) at org.jmxtrans.agent.StatsDOutputWriter.doSend(StatsDOutputWriter.java:129) at org.jmxtrans.agent.StatsDOutputWriter.writeQueryResult(StatsDOutputWriter.java:119) at org.jmxtrans.agent.OutputWriterCircuitBreakerDecorator.writeQueryResult(OutputWriterCircuitBreakerDecorator.java:90) at org.jmxtrans.agent.Query.processAttributeValue(Query.java:274) at org.jmxtrans.agent.Query.collectAndExportAttribute(Query.java:216) at org.jmxtrans.agent.Query.collectAndExportForObjectName(Query.java:173) at org.jmxtrans.agent.Query.collectAndExport(Query.java:167) at org.jmxtrans.agent.TimeTrackingCollector.collectIfEnoughTimeHasPassed(TimeTrackingCollector.java:54) at org.jmxtrans.agent.JmxTransExporter.collectAndExport(JmxTransExporter.java:209) at org.jmxtrans.agent.JmxTransExporter$2.run(JmxTransExporter.java:131) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2017-03-17 12:46:29.031 SEVERE [jmxtrans-agent-1] org.jmxtrans.agent.StatsDOutputWriter - Could not send stat java.nio.HeapByteBuffer[pos=0 lim=1006 cap=1024] to host telegraf:8125 java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:101) at sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:429) at org.jmxtrans.agent.StatsDOutputWriter.flush(StatsDOutputWriter.java:158) at org.jmxtrans.agent.StatsDOutputWriter.postCollect(StatsDOutputWriter.java:104) at org.jmxtrans.agent.OutputWriterCircuitBreakerDecorator.postCollect(OutputWriterCircuitBreakerDecorator.java:124) at org.jmxtrans.agent.JmxTransExporter.collectAndExport(JmxTransExporter.java:214) at org.jmxtrans.agent.JmxTransExporter$2.run(JmxTransExporter.java:131) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

  4. Removed firewall rule and verified jmxtrans-agent recovered from failures, which it did.

foygl commented 7 years ago

@cyrille-leclerc Are there plans for making an official release with this fix in it? We are seeing this issue a lot and it would be good to have an official release rather than us relying on a random snapshot build. This has been really useful for us, so keep up the good work! 🥇

cyrille-leclerc commented 6 years ago

Released in 1.2.6