Better shut down - Githubissues

cyrille-leclerc commented 8 years ago

Please @YannRobert @johnou feel free to review

YannRobert commented 8 years ago

Thank you Cyrille for creating the PR, it will be easier to comment this way.

For the record, I'm linking to my last comment on the subject https://github.com/jmxtrans/embedded-jmxtrans/issues/99#issuecomment-155385594

It seems to me that this PR is quite similar to PR #94, except that the synchronization logic and the OutputWriter Objects are managed directly by the EmbeddedJmxTrans class in this PR. Following the SOLID principles, I tends to prefer PR #94.

Implementation has a direct impacts on ease of evolution. #97 and #98 were much easier to fix using an intermediate class that handle the Set of OutputWriters.

cyrille-leclerc commented 8 years ago

@johnou I integrated your suggestions, can you please review?

@YannRobert thanks for taking the time to review this PR. I am sorry but, as I said in https://github.com/jmxtrans/embedded-jmxtrans/pull/94#issuecomment-118832584, I prefer to handle the lifecycle and its underlying synchronization in EmbeddedJmxTrans.java instead of introducing an OutputWriterSet as you proposed. I don't know how much SOLID is each implementation but the approach I am following is inspired by what I have seen in Spring Framework components and these components are usually the state of the art of the style of coding that I try to have. I am aware of your work on PR https://github.com/jmxtrans/embedded-jmxtrans/pull/97 "Better Export Batching" and PR https://github.com/jmxtrans/embedded-jmxtrans/pull/98 "Dont discard metrics on exception while exporting" and I am thinking, for these topics, to introduce a sink/source pattern as you have looked at in https://github.com/YannRobert/embedded-jmxtrans/tree/Introduce_SourcesAndSinksAbstraction_TowardEventBasedExport .

YannRobert commented 8 years ago

Are you saying you are going to basically rewrite all those pull requests ? This is going over my head.

Reviewing PRs and proposing changes to them before or after merging would be a much more effective way to collaborate IMHO. Doing so in a timely manner would also help to fix the PR before it starts to be a dependency for other PRs.

So Cyrille, maybe you should state you do not wish nor accept PRs on the project in the first place? so people know they have to describe the problem and wait for you to fix it. I'm afraid people will just keep their changes for them next time.

I already sent some time on those PRs, so I may not be able to spend much time helping you test your own implementations of the same things. Especially since it will be hard and time consuming to merge them with my others changes that I need until you re-implement them.

As my PRs were accidentally closed by the force push on master, do you want me to recreate them?

cyrille-leclerc commented 8 years ago

@johnou please review the updated code with simplified exception handling and better documentation on the way it handles InterruptedException. I re-read IBM's article and many other articles. I feel that we cn stick with re-throwing the exception and keeping the component in a 'stale' state in case of InterruptedException, I explained my reasoning in the code.

@YannRobert

Are you saying you are going to basically rewrite all those pull requests ?

No but I may prefer different implementations for some topics. Regarding the implementation based on OutputWriterSet I commented during the summer that I did not feel comfortable with this approach. I'll try to be more responsive.

As my PRs were accidentally closed by the force push on master, do you want me to recreate them?

If it's not too much work for you, please do recreate them so that we can move forward. Otherwise, I can do a diff from the branches on your repo.

johnou commented 8 years ago

@cyrille-leclerc what about the case where the thread might be interrupted for some reason inbetween the first and second executor? I suspect this might cause the JVM to hang and not exit which is why I was leaning for what they recommend in the ExecutorService javadoc.

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html

   pool.shutdown(); // Disable new tasks from being submitted
   try {
     // Wait a while for existing tasks to terminate
     if (!pool.awaitTermination(60, TimeUnit.SECONDS)) {
       pool.shutdownNow(); // Cancel currently executing tasks
       // Wait a while for tasks to respond to being cancelled
       if (!pool.awaitTermination(60, TimeUnit.SECONDS))
           System.err.println("Pool did not terminate");
     }
   } catch (InterruptedException ie) {
     // (Re-)Cancel if current thread also interrupted
     pool.shutdownNow();
     // Preserve interrupt status
     Thread.currentThread().interrupt();
   }

johnou commented 8 years ago

@cyrille-leclerc another thought which just came to mind is to use daemon threads for these executor pools, wdyt?

cyrille-leclerc commented 8 years ago

@johnou thanks for the code snippet, I was not aware of this code snippet. I'll modify the code to integrate it.

another thought which just came to mind is to use daemon threads for these executors, wdyt?

The threads of the executors are already daemon threads, it's the reason why I was not too worried.

https://github.com/jmxtrans/embedded-jmxtrans/blob/696645d866ff681be52d083174e587fca73263b5/src/main/java/org/jmxtrans/embedded/EmbeddedJmxTrans.java#L224

YannRobert commented 8 years ago

We should not have a so complex code to perform the shutdown.

shutdownNow() should be sufficient. There should be no need to awaitTermination(...), because we don't care to wait for those tasks to finish, we just want to make sure that they will not use the OutputWriter(s) after we close them (to avoid the Exception reported in #99). It the task reacts to being interrupted, then it will end quickly and gracefully anyways.

We don't want to hang more than necessary the shutdown of the JVM waiting for the tasks to finish. I guess you don't want your application to take several seconds to finish when (for instance) you CTRL+C in your console, you will want it to be as short as possible. (Imagine the Graphite backend is disconnected in the Pool, eventually unavailable at this time for a new Connection, and you have a long socket timeout).

But in order to use shutdownNow() without awaitTermination(...), you need to synchronize the OutputWriter(s) write(...) with the EmbeddedJmxTrans stop() method. Otherwise you may have the #99 problem sometimes.

But it's hard to synchronize one Object methods with several other Object's other methods. Using an intermediary class would be handy.

An other thing I noticed in your change, is that you have the last collect+export executed before Executors are shutdown. Previously, Executors were shutdown before the last collect+export.

In the event you insist on using awaitTermination(...), maybe you'll then find more appropriate to shutdown() both Executors, then do the last collect+export, then do the awaitTermination(...) on both Executors?

cyrille-leclerc commented 8 years ago

@YannRobert good catch on awaitTermination() it's no longer needed since we invoke collectMetrics()and exportMetrics().

Regarding the synchronization on the outputwritters, the lifecycleLock is intended to ensure it. Invocations to query.exportCollectedMetrics() (and thus to outputWriter.write()) are synchronized by the lifecycleLock and the first action in the lock is to check the status of embeddedJmxTrans.state.

johnou commented 8 years ago

:bee:

YannRobert commented 8 years ago

@cyrille-leclerc looks better now. With my bare eyes, it looks like this could work. Though I didn't run the code.

I think you should add some tests on the behavior when start() fails in the try body. It look like at the moment, the state is set to STARTING, and no further start() or stop() will recover the state of the Object. I think you should remove the temporary State enum values, and have them be {STOPPED, STARTED}. Temporary State values are unreachable from the getState() method anyways, thanks to the read-write-lock.

It may help to avoid catching java.lang.Exception on the entire try body of the start() method (keep java.lang.RuntimeException if you like to log them), but rather nearer to the Exception origin (that probably are query.start() and outputWriter.start()).

johnou commented 8 years ago

@YannRobert :+1:

johnou commented 8 years ago

@cyrille-leclerc hope all is well where you are. I will be testing your branch against our application with the changes @YannRobert suggested https://github.com/johnou/embedded-jmxtrans/commit/4b95c38b49fb7b30068c565d851b3629a6d629d8

johnou commented 8 years ago

@cyrille-leclerc lgtm

2015-11-23 15:21:30,084 INFO  [Thread-0] (com.sulake.common.spring.ClassPathXmlServiceApplicationContext) Closing com.sulake.common.spring.ClassPathXmlServiceApplicationContext@de0a01f: startup date [Mon Nov 23 15:05:32 UTC 2015]; root of context hierarchy
2015-11-23 15:21:30,083 INFO  [EmbeddedJmxTransShutdownHook-Thread-77] (org.jmxtrans.embedded.spring.SpringEmbeddedJmxTrans) Stop...
2015-11-23 15:21:30,094 INFO  [EmbeddedJmxTransShutdownHook-Thread-77] (org.jmxtrans.embedded.spring.SpringEmbeddedJmxTrans) Unregister shutdown hook
2015-11-23 15:21:30,094 INFO  [EmbeddedJmxTransShutdownHook-Thread-77] (org.jmxtrans.embedded.spring.SpringEmbeddedJmxTrans) Shutdown collectScheduledExecutor and exportScheduledExecutor...
2015-11-23 15:21:30,095 INFO  [EmbeddedJmxTransShutdownHook-Thread-77] (org.jmxtrans.embedded.spring.SpringEmbeddedJmxTrans) Collect metrics...
2015-11-23 15:21:30,118 INFO  [EmbeddedJmxTransShutdownHook-Thread-77] (org.jmxtrans.embedded.spring.SpringEmbeddedJmxTrans) Export metrics...
2015-11-23 15:21:30,123 INFO  [EmbeddedJmxTransShutdownHook-Thread-77] (org.jmxtrans.embedded.spring.SpringEmbeddedJmxTrans) Stop queries...
2015-11-23 15:21:30,124 INFO  [EmbeddedJmxTransShutdownHook-Thread-77] (org.jmxtrans.embedded.spring.SpringEmbeddedJmxTrans) Stop output writers...
2015-11-23 15:21:30,124 INFO  [EmbeddedJmxTransShutdownHook-Thread-77] (org.jmxtrans.embedded.output.GraphiteWriter) Stop GraphiteWriter connected to 'HostAndPort{host='xxx', port=2003}' ...
2015-11-23 15:21:30,127 INFO  [EmbeddedJmxTransShutdownHook-Thread-77] (org.jmxtrans.embedded.spring.SpringEmbeddedJmxTrans) Set state to STOPPED
2015-11-23 15:21:30,127 INFO  [EmbeddedJmxTransShutdownHook-Thread-77] (org.jmxtrans.embedded.spring.SpringEmbeddedJmxTrans) Stopped

cyrille-leclerc commented 8 years ago

Thanks @johnou I had a very similar change. I also introduced a State.ERROR. Can you re-check, it should be the same.

johnou commented 8 years ago

@cyrille-leclerc identical output! When might I expect this branch to be merged and a new version available in Maven?

YannRobert commented 8 years ago

Here again the ERROR state would be final and cannot be recovered @cyrille-leclerc It would be nice to have the Object be able to retry any transition, and fix it's internal state be it broken.

If you have more than just {STOPPED, STARTED} in enum State

then

if (!State.STOPPED.equals(state))

in method start() better be

if (State.STARTED.equals(state))

then

if (!State.STARTED.equals(state)) {

in method stop() better be

if (State.STOPPED.equals(state)) {

so that a transition method will follow the fast exit path only when the expected state is already the current state.

cyrille-leclerc commented 8 years ago

@YannRobert I thought about this idea to repair the state when it is ERROR. I also imagined to move from start()and stop() to postConstruct() and preDestroy() so that we don't have to analyse all the weird cases where an outputwriter would fail to start or stop or where a thread pool would have issues.

I fill that having a start/stop behavior that is bulletproof to failures is very hard and would not be of interest for most of the users.

Do you use stop & start outside of the default lifecycle?

johnou commented 8 years ago

@YannRobert ping.

YannRobert commented 8 years ago

I was just making a suggestion here, trying to help. I'm not going to argue. Either way it seems it's good to fix the initial issue.

As far as I'm concerned I will be maintaining my fork (including #94), for a number of reasons.

cyrille-leclerc commented 8 years ago

Thanks @YannRobert

jmxtrans / embedded-jmxtrans

Better shut down #110