jmxtrans / jmxtrans-agent

Java Agent based JMX metrics exporter.
MIT License
178 stars 110 forks source link

jmxtrans-agent crashes when graphite receiver disappears #77

Closed unixninja92 closed 8 years ago

unixninja92 commented 8 years ago

I have jmxtrans-agent sending graphite to influxdb, but if the influxdb server ever goes away or is switched out, jmxtrans-agent crashes and has to be restarted to start sending data again. It would be nice if it tried reconnecting and possibly even buffering data until it could send graphite data again.

cyrille-leclerc commented 8 years ago

Hello @unixninja92 can you detail more please? What do you mean by "crash", I suspect that it's not killing the JVM, I suspect that you mean that it does not recover. Do you have log messages, can you share the configuration details?

unixninja92 commented 8 years ago

right, it's not recovering. Here is the stacktrace:

at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) 
 at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) 
 at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) 
 at org.jmxtrans.agent.GraphitePlainTextTcpOutputWriter.postCollect(GraphitePlainTextTcpOutputWriter.java:140) 
 at org.jmxtrans.agent.OutputWriterCircuitBreakerDecorator.postCollect(OutputWriterCircuitBreakerDecorator.java:124) 
 at org.jmxtrans.agent.JmxTransExporter.collectAndExport(JmxTransExporter.java:175) 
 at org.jmxtrans.agent.JmxTransExporter$2.run(JmxTransExporter.java:85) 
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) 
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) 
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
 at java.lang.Thread.run(Thread.java:745) 
2016-05-09 14:27:20.86 WARNING [jmxtrans-agent-1] org.jmxtrans.agent.JmxTransExporter - Ignore exception flushing metrics
java.net.SocketException: Broken pipe 
 at java.net.SocketOutputStream.socketWrite0(Native Method) 
 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) 
 at java.net.SocketOutputStream.write(SocketOutputStream.java:159) 
 at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) 
 at java.net.SocketOutputStream.socketWrite0(Native Method) 
2016-05-09 14:27:20.858 WARNING [jmxtrans-agent-1] org.jmxtrans.agent.GraphitePlainTextTcpOutputWriter - Exception flushing the stream to HostAndPort{host='influxdb.DOMAIN.com', port=2003} 
java.net.SocketException: Broken pipe 
 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) 
 at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) 
 at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) 
 at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) 
 at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) 
 at org.jmxtrans.agent.GraphitePlainTextTcpOutputWriter.postCollect(GraphitePlainTextTcpOutputWriter.java:140) 
 at org.jmxtrans.agent.OutputWriterCircuitBreakerDecorator.postCollect(OutputWriterCircuitBreakerDecorator.java:124) 
 at org.jmxtrans.agent.JmxTransExporter.collectAndExport(JmxTransExporter.java:175) 
 at org.jmxtrans.agent.JmxTransExporter$2.run(JmxTransExporter.java:85) 
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) 
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) 
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
 at java.lang.Thread.run(Thread.java:745) 
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
unixninja92 commented 8 years ago

relevant config (minus queries section): `

influxdb.DOMAIN.com
    <port>2003</port>
    <namePrefix>{{Service}}.{{Environment}}.{{VPC}}.${HOSTNAME}.</namePrefix>
</outputWriter>
<collectIntervalInSeconds>10</collectIntervalInSeconds>`
kerlandsson commented 8 years ago

@unixninja92 You are missing the top of the stack trace, could you please post the entire trace?

Also, when you say it is not recovering - what is logged after this error occurs?

Would it be possible to attach the complete log file?

unixninja92 commented 8 years ago

@kerlandsson I'm getting this log from aws cloudwatch, I'll see if I can find the top. And after, it just stops logging from what I can tell.

unixninja92 commented 8 years ago

The line in the log just above the stack trace is: 09 May 2016 14:27:20.564 INFO : Authentication Exception: Failed login attempt for user 'XXXXXX' (Invalid username or password) from XXX.XXX.XXX.XXX

but that's from our program, not from jmxtrans-agent. There is a blank log line (contains only a space) right above that though, so maybe that's where something is supposed to be? But there isn't anything above that for at least 10s that would have anything to do with jmxtrans-agent.

unixninja92 commented 8 years ago

The other instances of the same service all running jmxtrans-agent don't have any stacktraces from jmxtrans-agent, they just stopped reporting to influx.

kerlandsson commented 8 years ago

I did some tests on my end and the GraphitePlainTextTcpOutputWriter reconnects as expected for me. I even added an automatic test to verify it reconnects.

From your stack trace it seems you are not using one of the more recent versions. What version are you using?

I suggest you upgrade to the latest release of jmxtrans-agent. If you still suffer this problem, try enabling trace logging as detailed here: https://github.com/jmxtrans/jmxtrans-agent/wiki/Troubleshooting#enabling-logs and we can investigate further.

unixninja92 commented 8 years ago

We are running 1.2.3 and 1.2.2.

I'm going to turn on logging and see if that helps.

unixninja92 commented 8 years ago

So it turns out that jmxagent-trans is not the issue here, the logs show it continues to send data to influx. That data just gets lost somewhere between jmxagent-trans and influx. Let me know if you have any suggestions on things to try. The reason I assumed it was jmxagent-trans at first was because restarting the instance the java service was running on fixed the issue.