hypfvieh / dbus-java

Improved version of java DBus library provided by freedesktop.org (https://dbus.freedesktop.org/doc/dbus-java/)
https://hypfvieh.github.io/dbus-java/
MIT License
185 stars 73 forks source link

Possible race condition when disconnecting #123

Closed infeo closed 3 years ago

infeo commented 3 years ago

Summary

There is a possible race condition when closing the connection to the DBus via the DBusConnection class leading to an java.util.concurrent.RejectedExecutionException.

Description

In https://github.com/swiesend/secret-service/issues/21 a problem is encountered where after a successfully established DBus connection with the DBusConnection class and the (later happend) disconnect, still at least one message is tried to handled. Because the appearance of the problem is intermittent, it may be a race condition. The exception stacktrace:

Exception in thread "DBusConnection" java.util.concurrent.RejectedExecutionException: Task org.freedesktop.dbus.connections.AbstractConnection$3@5a0fc9cb rejected from java.util.concurrent.ThreadPoolExecutor@1f0d4cee[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
    at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2057)
    at java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:827)
    at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1357)
    at org.freedesktop.dbus.connections.AbstractConnection.executeInWorkerThreadPool(AbstractConnection.java:946)
    at org.freedesktop.dbus.connections.AbstractConnection.handleMessage(AbstractConnection.java:920)
    at org.freedesktop.dbus.connections.AbstractConnection.handleMessage(AbstractConnection.java:709)
    at org.freedesktop.dbus.connections.IncomingMessageThread.run(IncomingMessageThread.java:43)

It seems that the workerThreadPoolExecutor was already terminated when the message should be handled. Looking at docs of the ThreadPoolExecutor, there are three cases where it is shutdown:

  1. explicit call of shutdown()
  2. explicit call of shutdownNow()
  3. it has no remaining threads and is no longer referenced

The first two cases happen only in either in the changeThreadCount() or disconnectInternal() method of the AbstractConnection class. The first one basically transfers all jobs to a new executor in a thread safe manner, hence it should impose no problem.

The second, disconnectInternal(), is therefore the one I looked at. The DBusConnection is an object which recieves, send and handle messages. Currently, first the handling is closed and afterwards the recieving part. https://github.com/hypfvieh/dbus-java/blob/ecb3ad04893f52908ffae1c7251e68357f1e1d3d/dbus-java/src/main/java/org/freedesktop/dbus/connections/AbstractConnection.java#L542-L563

The message source is the IncomingMessageThread. This object is terminated via a variable:

    private boolean                  terminate;

    /*
     *  other stuff
     */

    public void setTerminate(boolean _terminate) {
        terminate = _terminate;
        interrupt();
    }

    @Override
    public void run() {

        Message msg = null;
        while (!terminate) {
            /*
             * do stuff
             */
        }
    }

While I'm not a fan of closing the message source after closing the handling, more critical is that neither the connected variable nor terminate are volatile. This could lead to a cache coherence problem (see https://www.baeldung.com/java-volatile#volatile).

Suggested Fix

Make the variables connected and terminate volatile.

Additionally, i suggest to first close the message source (setting terminate to false) before shutting down the executor.

As a third point, maybe the setTerminate(boolean _terminate) method can be refactored to terminate() to only terminate the connection and not setting it to any value.

hypfvieh commented 3 years ago

I implemented some of the suggestions. Please try if you can reproduce the issue using the latest changes.

swiesend commented 3 years ago

Hey @hypfvieh!

First of all a huge thank you for providing this lib and maintaining it with such quick replies!

I had a chance to test your changes, but unfortunately that didn't play out.

I documented my test here: https://github.com/cryptomator/integrations-linux/issues/2

hypfvieh commented 3 years ago

Are you sure you use the latest git revision?

The line numbers reported in the exception do not match. e.g.AbstractConnection.java:946 is an empty line, AbstractConnection.java:920 is the closing bracket of a catch block ... AbstractConnection.java:709 is a piece of javadoc IncomingMessageThread.java:43 this is the only correct line, but the the only change in that class was to add a volatile to the boolean member, so line numbers are likely the same as before

swiesend commented 3 years ago

Your were absolutely right @hypfvieh. I started a wrong cryptomator build with old dependencies.

I was quite confused yesterday evening as I couldn't observe the RejectedExecutionException in my secret-service tests anymore, but still with the wrong cryptomator build. But in my secret-serivice tests the RejectedExecutionException occurs less frequently, as when integrated with cryptomator.

So your changes actually have fixed the RejectedExecutionException issue. Maybe it would be worth to add a regression test for that @hypfvieh?

But I still have problems on tearing down, as documented here.

swiesend commented 3 years ago

I still can trigger the RejectedExecutionException and the blocked shutdown:

https://github.com/cryptomator/integrations-linux/issues/2#issuecomment-726844687

hypfvieh commented 3 years ago

Seems to be fixed, see cryptomator/integrations-linux#2