Closed n8fr8 closed 2 years ago
@bitmold so is this an issue with OnionService
?
I assume you mean OrbotService
, and I'm not sure yet, looking into this now...
This is still happening with Orbot testing. The lock is not released sometimes, after multiple stops and starts.
We are using "unbindService(torServiceConnection);" to disconnect and stop Tor.
@n8fr8 @bitmold can you provide steps to reproduce?
@akwizgran @grote any ideas here? it seems related to the lock that was added in #59
Looking for places where the thread started by startTorServiceThread() could get stuck while holding the lock, one candidate is the call to controlPortThreadStarted.await(). If controlPortThreadStarted.countDown() is not called for some reason then the thread started by startTorServiceThread() will wait here indefinitely, even if the service instance has been destroyed in the meantime and replaced by a new instance that wants to acquire the lock.
However, I can't see any obvious reasons why controlPortThreadStarted.countDown() would fail to be called. All the control port thread does before counting down the latch is setting the thread priority and starting to watch a file observer. Could either of those throw a RuntimeException that would kill the control port thread before it counts down the latch?
Looking for other candidates that could cause the thread started by startTorServiceThread() to get stuck while holding the lock, the only other one I can see would be runMain() failing to return. This could happen if for some reason onDestroy() wasn't called when unbinding the service (service manually started as well as bound?), or if torControlConnection was null when onDestroy() was called, so that shutdownTor() would fail to call torControlConnection.shutdownTor().
torControlConnection is initialised by the control port thread, after it counts down controlPortThreadStarted latch and does some other potentially exception-throwing stuff. So I have two theories to suggest:
In either case, shutdownTor() finds that torControlConnection is null, so it doesn't call torControlConnection.shutdownTor(). Tor isn't told to shut down, so runMain() doesn't return. The lock remains held and the new service instance waits forever to acquire it.
To test these theories I'd suggest the following:
If the problem still happens then my theories were wrong but we made some harmless clean-ups. :-)
Ah ha, I had forgotten about this issue on the other thread in Orbot.
There are crude steps to reproduce this issue here: https://github.com/guardianproject/orbot/issues/621 ~ But for what it's worth this still occurs in Orbot
Even though @n8fr8 reported it in his stack trace, it was a bit surprising to me to see that Java_org_torproject_jni_TorService_prepareFileDescriptor
will, in native code throw an IOException
// ...
jclass io_exception = (*env)->FindClass(env, "java/io/IOException");
if (io_exception == NULL) {
return NULL;
}
if (fd < 0 || connect(fd, (struct sockaddr *) &addr, sizeof(addr)) == -1) {
char buf[1024];
snprintf(buf, 1023, "%s open: %s", filename, strerror(errno));
(*env)->ThrowNew(env, io_exception, buf);
return NULL;
}
// ...
It seems like when this is caught in java TorService
stops itself with runLock
locked
} catch (IOException | ArrayIndexOutOfBoundsException | InterruptedException e) {
e.printStackTrace();
broadcastStatus(TorService.this, STATUS_STOPPING);
TorService.this.stopSelf();
}
Maybe we shouldn't call controlPortThreadStarted.countDown()
until after the native call to prepareFileDescriptor
. I'm going to look into this line of thinking tomorrow...
@akwizgran sadly your insights weren't immediately helpful in fixing this bug, I appreciate them though they have definitely been helpful in figuring out this problem
How about something like:
} catch (IllegalStateException | IllegalArgumentException | InterruptedException e) {
e.printStackTrace();
broadcastError(context, e);
broadcastStatus(context, STATUS_STOPPING);
TorService.this.stopSelf();
if (runLock.isLocked()) {
Log.i(TAG, "Releasing lock");
runLock.unlock();
}
}
nevermind, I was looking at old code. Is there repro instructions somewhere? Ideally, there would be a test case that reproduces it, that should make it easy to fix.
I agree with @akwizgran's ideas in https://github.com/guardianproject/tor-android/issues/67#issuecomment-1043258065 Here is a diff that implements 1 and 2. @bitmold did you try those? I think 3 would also be worth trying.
--- a/tor-android-binary/src/main/java/org/torproject/jni/TorService.java
+++ b/tor-android-binary/src/main/java/org/torproject/jni/TorService.java
@@ -171,7 +171,7 @@ public class TorService extends Service {
private long torConfiguration = -1;
private int torControlFd = -1;
- private TorControlConnection torControlConnection;
+ private volatile TorControlConnection torControlConnection;
/**
* This lock must be acquired before calling createTorConfiguration() and
@@ -261,7 +261,6 @@ public class TorService extends Service {
}
};
controlPortFileObserver.startWatching();
- controlPortThreadStarted.countDown();
countDownLatch.await(10, TimeUnit.SECONDS);
controlPortFileObserver.stopWatching();
File controlSocket = new File(observeDir, CONTROL_SOCKET_NAME);
@@ -281,6 +280,8 @@ public class TorService extends Service {
socksPort = getPortFromGetInfo("net/listeners/socks");
httpTunnelPort = getPortFromGetInfo("net/listeners/httptunnel");
+ controlPortThreadStarted.countDown();
+
} catch (IOException | ArrayIndexOutOfBoundsException | InterruptedException e) {
e.printStackTrace();
broadcastStatus(TorService.this, STATUS_STOPPING);
@@ -289,7 +290,7 @@ public class TorService extends Service {
}
};
- private CountDownLatch controlPortThreadStarted;
+ private volatile CountDownLatch controlPortThreadStarted;
private int getPortFromGetInfo(String key) {
final String value = getInfo(key);
I did try those, to no avail.
It seems like you could move controlPortThreadStarted.countDonw()
down to after httpTunnelPort
's assignment, but that actually won't work. The reason is that in the thread named "tor"
the tor binary won't actually run until controlPortThreadStarted
is done waiting. Since controlPortThreadStarted
in your example above only counts down after the control port is authenticated, events are set, etc and this only happens after tor is up and running.
I'm currently investigating a solution with two CountDownLatch
's. One for starting the control port thread and basically waiting for the ContorlSocket
file to be created (which means you can run the tor binary program), and a second one after httpTunnelPort
is assigned. When both of these latches are over and done with, perhaps then we can unlock runLock
...
sounds like a good idea. If you implemented @akwizgran's #3 idea, could you share it with me? It seems that it would be good to have in TorService anyway.
tor 0.4.6.8
integrated with Orbot, on various cycles of start/stop, I can generate this error:
2021-12-21 12:48:02.856 7322-7622/org.torproject.android W/System.err: java.io.IOException: /data/user/0/org.torproject.android/app_TorService/data/ControlSocket open: Connection refused 2021-12-21 12:48:02.856 7322-7622/org.torproject.android W/System.err: at org.torproject.jni.TorService.prepareFileDescriptor(Native Method) 2021-12-21 12:48:02.856 7322-7622/org.torproject.android W/System.err: at org.torproject.jni.TorService.access$300(TorService.java:43) 2021-12-21 12:48:02.856 7322-7622/org.torproject.android W/System.err: at org.torproject.jni.TorService$2.run(TorService.java:267)
after that happens, I can no longer start tor, because it doesn't proceed past this error: 2021-12-21 12:49:09.738 7322-7718/org.torproject.android I/TorService: Waiting for lock
even if unbind, and stop the OrbotService it still happens. Only a force stop of the app resolved the issue.