javaee / grizzly

Writing scalable server applications in the Java™ programming language has always been difficult. Before the advent of the Java New I/O API (NIO), thread management issues made it impossible for a server to scale to thousands of users. The Grizzly NIO framework has been designed to help developers to take advantage of the Java™ NIO API.
https://javaee.github.io/grizzly/
Other
222 stars 60 forks source link

Deadlock between NIOConnection#notifyCloseListeners and SingleEndpoin… #1989

Closed carryel closed 6 years ago

carryel commented 6 years ago

…tPool#detach #1988

carryel commented 6 years ago

@andrelanka Thanks for your comment. I think I should explain more. Overall, I absolutely agree your comments if we concentrate NIOConnection#notifyCloseListeners' code block.

Please see the comment:

I assume that clear() is called because the list is re-used later on or can be filled with other listeners after notifyCloseListeners is called the first time or clear() prevents that listeners are executed multiple times by multiple calls to notifyCloseListeners.

I think multiple notifications will be protected by other closing states rather than by clear().

One could add another listener to the list while line 887 and 888 are executed. The CopyOnWriteArrayList makes sure that the new listener is not called within the loop. Unfortunately the new listener is removed when line 890 is executed. So it will never be called.

After notifyCloseListeners() is called, closeListeners should not be filled because only terminate0() calls notifyCloseListeners() while the connection will be closing after making the connection's closeReason. I think that addCloseListener() will invoke the listener individually without adding listener into the list.

Another possibility is that two calls to notifyCloseListeners are executed concurrently. Then each listener is called at least twice.

Also, I think notifyCloseListeners should not be called twice. When a connection will be closed, the connection will have a closeReason and several state for closing. You can review terminate0().

Minor: As the number of write operations outnumbers the number of expected traversals, CopyOnWriteArrayList is not the first choice. Maybe a concurrent queue could be more appropriate (if a switch of the data structure is necessary at all). Another possibility is javaee#1984

I also agree. (a concurrent queue seems to have a little issue at https://github.com/javaee/grizzly/issues/1962)

carryel commented 6 years ago

@rlubke I don't care any patch(https://github.com/javaee/grizzly/pull/1989 or https://github.com/javaee/grizzly/pull/1984) if we can fix the deadlock issue(I think this issue is so serious and all branches should be fixed). Could you review PRs or consider another solution?

andrelanka commented 6 years ago

@carryel Thanks for your explanations. You're absolutely right, notifyCloseListeners is called only once. By this my fears are groundless. Nevertheless, I tend to #1984 as we copy the listeners at most once (when notifyCloseListeners is called for a non-empty listeners list) and not with each write access (what is usually done at least twice -- during add of a listener and in ServerRuntime$AsyncResponder.resume when the listener is removed in Response.SuspendedContextImpl#markResumed). Maybe not that big issue as the backing arrays are small, but avoidable.

rlubke commented 6 years ago

@carryel Just back from vacation. Will review today or tomorrow.

fsgonz commented 6 years ago

We have a customer that has an app which results in this deadlock (though we don't have the app).

rlubke commented 6 years ago

Opted with the other PR for the time being. Thanks, Bongjae!