logzio / jmx2graphite

JMX to Graphite every x seconds in one command line (Docker based) (also come in Java Agent flavour)
MIT License
77 stars 23 forks source link

Crude Fix for Broken Pipe Exception #79

Closed ChristophKreutzer23 closed 3 years ago

ChristophKreutzer23 commented 4 years ago

Hi,

I created a crude fix for the issue of being stuck with a broken pipe Exception and not reconnecting. Unfortunately I was not able to recreate the Exception in a test setup, so I just created a dummy implementation and threw an IOException("Broken Pipe") myself. If anybody has an idea of how to recreate this in a test environment, please tell me and I will rework the tests and perhaps the fix itself. I am not an expert when it comes to networking with Java, so all I was able to produce were timeout and connection refused Exceptions.

6

asafm commented 4 years ago

@ChristophKreutzer23 - Thanks for taking time to fix this. I skimmed it, and then thought that before I continue, it's better first to have a test that fails first, right? So I managed to cram a few hours today and created one: https://github.com/logzio/jmx2graphite/tree/broken-pipe-test

I would beautify it of course, but I think you can use that as a base to test your changes - if that works, we can proceed to the proper review, ok?

And sorry for the delay, it was a very busy week

ChristophKreutzer23 commented 4 years ago

Awesome. Thank you. That's exactly what I needed. Since I do not really like my current solution I am going to rework it and test it with your new test class. I hope I can finish it before the end of the week.

ChristophKreutzer23 commented 4 years ago

@asafm I updated the fix using Failsafe as you suggested here https://github.com/logzio/jmx2graphite/issues/6#issuecomment-584498073. Basically I checked for a Broken Pipe Exception and when one is detected I closed the connection to Graphite. The next time it tries to send metrics, the connection is checked and a reconnect is initiated. This of course led to the Test failing due to a connection refused Exception during the time the graphite container is stopped. (Which I think is a reasonable Exception at that point). So I had to adjust the Test Case to incorporate the Connection refused Exception. I did not configure Failsafe, so it uses the default configs, maybe this has to be tweaked.

asafm commented 4 years ago

@ChristophKreutzer23 - Would like to continue on this so we can merge this great fix?

ChristophKreutzer23 commented 4 years ago

@asafm Yes, of course, sorry. Going through some changes job-wise and did not find the time to work on this. I hope I can work on it the coming weekend.

asafm commented 3 years ago

Can be closed right?

ChristophKreutzer23 commented 3 years ago

Yes