owntracks / android

OwnTracks Android App
http://owntracks.org
Eclipse Public License 1.0
1.34k stars 474 forks source link

Connection to HiveMQ dropping with an error #1623

Open nicholasmparker opened 6 months ago

nicholasmparker commented 6 months ago

Anyone experiencing this? From the debug logs 2024-03-01 10:47:36.005 E MessageProcessorEndpointMqtt: connectionLost error (32109) - java.io.EOFException at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(Unknown Source:213) at java.lang.Thread.run(Thread.java:1012) Caused by: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:275) at org.eclipse.paho.client.mqttv3.internal.wire.MqttInputStream.readMqttWireMessage(SourceFile:14) at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(Unknown Source:64) ... 1 more

jpmens commented 6 months ago

Under the assumption that this error is on our Android app, I'll transfer this issue.

growse commented 6 months ago

Difficult to know what's going on here. Could be the network, could be the broker not being responsive, could be the device not waking the app to do keepalives....

da-mkay commented 5 months ago

Hi,

I have exactly the same issue:

2024-03-18 22:11:13.042 E MessageProcessorEndpointMqtt: connectionLost error
2024-03-18 22:11:13.042 E MessageProcessorEndpointMqtt: (32109) - java.io.EOFException
    at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(Unknown Source:213)
    at java.lang.Thread.run(Thread.java:1012)
2024-03-18 22:11:13.042 E MessageProcessorEndpointMqtt: Caused by: java.io.EOFException
    at java.io.DataInputStream.readByte(DataInputStream.java:275)
    at org.eclipse.paho.client.mqttv3.internal.wire.MqttInputStream.readMqttWireMessage(SourceFile:14)
    at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(Unknown Source:64)
2024-03-18 22:11:13.042 E MessageProcessorEndpointMqtt: ... 1 more

The Android phone (Samsung Galaxy S21 FE) connects to the broker, keeps the connection for "exactly" 1 minute, then disconnects. Then I see the error above.\ But then it immediately reconnects. Thus, in my broker logs I see an LWT message and reconnect every minute:

[2024-03-18 22:10:12.523] moe connected
[2024-03-18 22:10:12.540] moe subscribed topics: owntracks/+/+ owntracks/+/+/event owntracks/+/+/info owntracks/+/+/waypoints owntracks/moe/phone/cmd
[2024-03-18 22:11:12.585] moe published to owntracks/moe/phone :
[2024-03-18 22:11:12.585] {"_type":"lwt","tst":1710796212}
[2024-03-18 22:11:12.585] moe disconnected

[2024-03-18 22:11:13.454] moe connected
[2024-03-18 22:11:13.470] moe subscribed topics: owntracks/+/+ owntracks/+/+/event owntracks/+/+/info owntracks/+/+/waypoints owntracks/moe/phone/cmd
[2024-03-18 22:12:13.516] moe published to owntracks/moe/phone :
[2024-03-18 22:12:13.516] {"_type":"lwt","tst":1710796273}
[2024-03-18 22:12:13.516] moe disconnected

[2024-03-18 22:12:14.231] moe connected
[2024-03-18 22:12:14.242] moe subscribed topics: owntracks/+/+ owntracks/+/+/event owntracks/+/+/info owntracks/+/+/waypoints owntracks/moe/phone/cmd
[2024-03-18 22:13:14.255] moe published to owntracks/moe/phone :
[2024-03-18 22:13:14.255] {"_type":"lwt","tst":1710796334}
[2024-03-18 22:13:14.255] moe disconnected

It doesn't matter if the phone uses WiFi or mobile network. Since it happens almost exactly each minute and the immediate reconnect works, it doesn't look to me like some network or server issue.

My iPhone stays connected all the time if I leave the app open. Running in the background it connects on demand (pushing location) and closes the connection afterwards.

Btw, I use MQTT over websockets. The MQTT broker sits behind an nginx reverse proxy.\ Connection between app and nginx is encrypted and secured via client certificate.\ The same is true for the connection between nginx and the broker.

ckrey commented 5 months ago

Probably nginx terminates the connection after 60 seconds of inactivity (https://nginx.org/en/docs/http/websocket.html) Either increase timeout values or investigate if nginx can do keepalive

da-mkay commented 5 months ago

Yep, it looks like increasing the timeouts in nginx solved it for me 😉

jpmens commented 5 months ago

@nicholasmparker can you confirm this solves the issue for you likewise?

da-mkay commented 5 months ago

After increasing the timeouts to 1 day in nginx I monitored the reconnects a few days. Sometimes the phone kept the connection for 2h, sometimes for 1h or 30 min or 22 min. The reconnect usually happened within 0 to 10 minutes. And when the connection is lost I see the error from the first post. Maybe these are "normal" connection losses due to ... Android stuff 😉

But today I switched the MQTT broker to Mosquitto, without changing the nginx config. Now I see the error log every exact 5 minutes while keep alive is set to 900s in the app. Reconnect happens immediately according to mosquitto logs. I do not see this behavior when opening the iOS app for let's say 10 minutes. It stays connected all the time.

Any ideas where this new magic 5 minutes come from? This was not the case with the last broker I used (aedes based).

growse commented 5 months ago

You've not said what device you're using, and that 5 mins could be a device thing. Do you see anything in the OT log about why it dropped the connection?

da-mkay commented 5 months ago

It is a Galaxy S21 FE. The OT log shows the error from the first posts here (connectionLost error).

A few days ago I switched back from Mosquitto to my other broker, and the connection losses every exact 5 minutes are gone. Now I am back to the behavior where the Android app can sometimes keep the connection for 2 hours, sometimes for 20 minutes and so on, which I guess is normal.

I don't know what the problem is between Android app and Mosquitto, because my other broker works well. I also noticed some problems between iOS app and Mosquitto that I do not have with the other broker: When I opened the app I always saw an old outdated location of a friend. I always had to restart the app to see the current location.

I can solve both problems by replacing Mosquitto (v2.0.18 btw) which is fine for me. But it would still be good to know what's the problem with Mosquitto. Are there any known issues or incompatibilities with Owntracks and Mosquitto?

jpmens commented 5 months ago

I'm having a hard time following. What is "my other broker", is that HiveMQ?

We're not aware of issues with Android and Mosquitto over MQTT, in fact most of us have that combination.

Can you get rid of the websocket connection and use plain MQTT (port 1883, or 8883 with TLS)?

ckrey commented 5 months ago

When switching between differen MQTT backends make sure to empty queues on the broker end by either

da-mkay commented 5 months ago

I'm having a hard time following. What is "my other broker", is that HiveMQ?

It's my own broker based on aedes (will be released soon).

We're not aware of issues with Android and Mosquitto over MQTT, in fact most of us have that combination. Can you get rid of the websocket connection and use plain MQTT (port 1883, or 8883 with TLS)?

That's interesting. Unfortunately I don't have permanent access to the Android phone, so testing that scenario is a bit difficult. Maybe I can find an old Android phone here and reproduce the issue.

When switching between differen MQTT backends make sure to empty queues on the broker end by either

  • resetting broker (mosquitto: delete mosquitto persistence db)
  • connect once with CleanSession flag set (modify settings in OwnTracks App or use another MQTT client using the same clientId)

I did the latter by setting the clean flag in the app and restarting the app, then switched back to clean=false and restarted the app again.

github-actions[bot] commented 3 months ago

Marking issue with request for more data as stale, due to no updates.

compuguy commented 3 months ago

Any updates?

jpmens commented 3 months ago

@compuguy updates for what? We're awaiting a response from OP.

compuguy commented 2 months ago

@compuguy updates for what? We're awaiting a response from OP.

Apologies for that. I'm not sure if it is the same issue (I use the free HiveMQ MQTT service), but I've had similar connection timeout/disconnect issues with the latest beta build of Owntracks (Beta 3). I'm connecting via a websocket though. I've redacted/changed certain information for privacy reasons:

2024-06-29 23:24:54.710 I [DefaultDispatcher-worker-2] MQTTReconnectWorker: MQTT reconnect worker job started
2024-06-29 23:24:54.818 I [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$connectToBroker: Connecting to wss://mqtt.MYDOMAIN.pw:443?# timeout = 30s
2024-06-29 23:24:55.920 I [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$connectToBroker: MQTT Connected. Subscribing to [owntracks/+/+, owntracks/+/+/event, owntracks/+/+/info, owntracks/ME/MYDEVICE/cmd]
2024-06-29 23:24:56.023 I [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$connectToBroker: MQTT Subscribed
2024-06-29 23:24:56.025 I [DefaultDispatcher-worker-2] MQTTReconnectWorker: MQTT reconnect worker job completed, status Success {mOutputData=Data {}}
2024-06-29 23:24:56.055 I [MQTT Call: MYDEVICE] MessageProcessor: Received incoming message: MessageLocation on owntracks/REDACTED/DEVICE1 with id=25080088
2024-06-29 23:24:56.058 I [MQTT Call: MYDEVICE] MessageProcessor: Received incoming message: MessageLocation on owntracks/REDACTED/DEVICE with id=e28ba878
2024-06-29 23:24:56.060 I [MQTT Call: MYDEVICE] MessageProcessor: Received incoming message: MessageLocation on owntracks/REDACTED/DEVICE with id=d8d711ec
2024-06-29 23:24:56.061 I [MQTT Call: MYDEVICE] MessageProcessor: Received incoming message: MessageLocation on owntracks/REDACTED/DEVICE with id=4ef12aa5
2024-06-29 23:27:04.197 I [MQTT Call: MYDEVICE] MessageProcessor: Received incoming message: MessageLwt on owntracks/REDACTED/DEVICE with id=0
2024-06-29 23:39:51.694 I [MQTT Call: MYDEVICE] MessageProcessor: Received incoming message: MessageLocation on owntracks/REDACTED/DEVICE with id=cb9512b8
2024-06-29 23:40:39.612 I [MQTT Call: MYDEVICE] MessageProcessor: Received incoming message: MessageLocation on owntracks/REDACTED/DEVICE with id=63b43d0c
2024-06-29 23:59:09.345 E [main] MQTTMessageProcessorEndpoint$mqttCallback: Connection Lost
Timed out waiting for a response from the server (32000)
    at kotlin.ExceptionsKt.createMqttException(SourceFile:10)
    at org.eclipse.paho.client.mqttv3.internal.ClientState.checkForActivity(SourceFile:141)
    at org.owntracks.android.net.mqtt.MQTTPingAlarmReceiver.onReceive(SourceFile:53)
    at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0(LoadedApk.java:1943)
    at android.app.LoadedApk$ReceiverDispatcher$Args.$r8$lambda$gDuJqgxY6Zb-ifyeubKeivTLAwk(Unknown Source:0)
    at android.app.LoadedApk$ReceiverDispatcher$Args$$ExternalSyntheticLambda0.run(Unknown Source:2)
    at android.os.Handler.handleCallback(Handler.java:958)
    at android.os.Handler.dispatchMessage(Handler.java:99)
    at android.os.Looper.loopOnce(Looper.java:257)
    at android.os.Looper.loop(Looper.java:368)
    at android.app.ActivityThread.main(ActivityThread.java:8848)
    at java.lang.reflect.Method.invoke(Native Method)
    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:572)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1049)
growse commented 2 months ago

@compuguy Is there any pattern to the periodicity of the disconnects?

The bottom line with a lot of this stuff is that relying on (or expecting) long-term TCP connections to never disconnect is highly optimistic, especially in a mobile context where changes to the connected network interface, the mobile OS, the whims of the other party and the internet all conspire to drop / ignore packets. TCP's particularly susceptible to lost packets, as the default fall-back error recovering mechanism is just to tear down the connection and start again.

ninjazx06 commented 1 month ago

I'm not sure this is the same issue but I am noticing this series of errors every hours or so. The symptom I am seeing is that the app frequently is not reporting the current locations of Friends (users). My Home Assistant is also subscribed to the queue and seems to be successfully updating. Is there a setting in the Owntracks app and/or HiveMQ I need to adjust?

Owntracks Version 2.5.0 Android version 14 (Android security update November 5, 2023) Device: Google Pixel 5

Also seeing this with Pixel 2XL and Pixel 3XL

2024-07-27 06:45:32.518 E [MQTT Rec: pixel5pixel5] MQTTMessageProcessorEndpoint$mqttCallback: Connection Lost
Connection lost (32109) - java.net.SocketException: Software caused connection abort
    at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(SourceFile:248)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
    at java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:307)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
    at java.lang.Thread.run(Thread.java:1012)
Caused by: java.net.SocketException: Software caused connection abort
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:118)
    at java.net.SocketInputStream.read(SocketInputStream.java:173)
    at java.net.SocketInputStream.read(SocketInputStream.java:143)
    at org.conscrypt.ConscryptEngineSocket$SSLInputStream.readFromSocket(Unknown Source:24)
    at org.conscrypt.ConscryptEngineSocket$SSLInputStream.processDataFromSocket(Unknown Source:172)
    at org.conscrypt.ConscryptEngineSocket$SSLInputStream.readUntilDataAvailable(Unknown Source:0)
    at org.conscrypt.ConscryptEngineSocket$SSLInputStream.read(SourceFile:3)
    at org.conscrypt.ConscryptEngineSocket$SSLInputStream.read(SourceFile:1)
    at java.io.DataInputStream.readByte(DataInputStream.java:273)
    at org.eclipse.paho.client.mqttv3.internal.wire.MqttInputStream.readMqttWireMessage(SourceFile:14)
    at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(SourceFile:77)
    ... 6 more
2024-07-27 06:45:32.520 D [MQTT Rec: pixel5pixel5] Scheduler: Scheduled ONETIME_TASK_MQTT_RECONNECT job
2024-07-27 06:45:42.540 I [DefaultDispatcher-worker-1] MQTTReconnectWorker: MQTT reconnect worker job started
2024-07-27 06:45:42.542 D [DefaultDispatcher-worker-1] MQTTMessageProcessorEndpoint$disconnect: MQTT client already not connected
2024-07-27 06:45:42.543 D [DefaultDispatcher-worker-1] MQTTMessageProcessorEndpoint$disconnect: Unregistered ping alarm receiver
2024-07-27 06:45:42.543 D [DefaultDispatcher-worker-1] MQTTMessageProcessorEndpoint$disconnect: MQTT disconnected in 1.466771ms
2024-07-27 06:45:42.546 D [DefaultDispatcher-worker-1] AlarmPingSender: Initializing MQTT keepalive AlarmPingSender
2024-07-27 06:45:42.675 I [DefaultDispatcher-worker-1] MQTTMessageProcessorEndpoint$connectToBroker: Connecting to ssl://a556ae9d9dba41159ebabaf82b32762b.s1.eu.hivemq.cloud:8883?# timeout = 30s
2024-07-27 06:45:44.932 D [DefaultDispatcher-worker-1] MQTTMessageProcessorEndpoint$connectToBroker: Registered ping alarm receiver
2024-07-27 06:45:44.932 I [DefaultDispatcher-worker-1] MQTTMessageProcessorEndpoint$connectToBroker: MQTT Connected. Subscribing to [owntracks/+/+, owntracks/+/+/event, owntracks/+/+/info, owntracks/PIXEL5/PIXEL5/cmd]
2024-07-27 06:45:45.042 I [DefaultDispatcher-worker-1] MQTTMessageProcessorEndpoint$connectToBroker: MQTT Subscribed
2024-07-27 06:45:45.043 D [DefaultDispatcher-worker-1] MessageProcessor: Resetting message send loop wait.

...


2024-07-27 07:45:46.106 E [main] MQTTMessageProcessorEndpoint$mqttCallback: Connection Lost
Timed out waiting for a response from the server (32000)
    at kotlin.ExceptionsKt.createMqttException(SourceFile:10)
    at org.eclipse.paho.client.mqttv3.internal.ClientState.checkForActivity(SourceFile:141)
    at org.owntracks.android.net.mqtt.MQTTPingAlarmReceiver.onReceive(SourceFile:53)
    at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0(LoadedApk.java:1802)
    at android.app.LoadedApk$ReceiverDispatcher$Args.$r8$lambda$gDuJqgxY6Zb-ifyeubKeivTLAwk(Unknown Source:0)
    at android.app.LoadedApk$ReceiverDispatcher$Args$$ExternalSyntheticLambda0.run(Unknown Source:2)
    at android.os.Handler.handleCallback(Handler.java:958)
    at android.os.Handler.dispatchMessage(Handler.java:99)
    at android.os.Looper.loopOnce(Looper.java:205)
    at android.os.Looper.loop(Looper.java:294)
    at android.app.ActivityThread.main(ActivityThread.java:8177)
    at java.lang.reflect.Method.invoke(Native Method)
    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:552)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:971)

2024-07-27 07:45:46.108 D [main] Scheduler: Scheduled ONETIME_TASK_MQTT_RECONNECT job
2024-07-27 07:46:02.144 I [DefaultDispatcher-worker-2] MQTTReconnectWorker: MQTT reconnect worker job started
2024-07-27 07:46:02.146 D [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$disconnect: MQTT client already not connected
2024-07-27 07:46:02.147 D [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$disconnect: Unregistered ping alarm receiver
2024-07-27 07:46:02.147 D [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$disconnect: MQTT disconnected in 1.275781ms
2024-07-27 07:46:02.149 D [DefaultDispatcher-worker-2] AlarmPingSender: Initializing MQTT keepalive AlarmPingSender
2024-07-27 07:46:02.405 I [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$connectToBroker: Connecting to ssl://a556ae9d9dba41159ebabaf82b32762b.s1.eu.hivemq.cloud:8883?# timeout = 30s
2024-07-27 07:46:03.748 D [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$connectToBroker: Registered ping alarm receiver
2024-07-27 07:46:03.749 I [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$connectToBroker: MQTT Connected. Subscribing to [owntracks/+/+, owntracks/+/+/event, owntracks/+/+/info, owntracks/PIXEL5/PIXEL5/cmd]
2024-07-27 07:46:03.853 I [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$connectToBroker: MQTT Subscribed
2024-07-27 07:46:03.854 D [DefaultDispatcher-worker-2] MessageProcessor: Resetting message send loop wait.

...

2024-07-27 08:46:05.407 E [MQTT Rec: pixel5pixel5] MQTTMessageProcessorEndpoint$mqttCallback: Connection Lost
Connection lost (32109) - java.net.SocketException: Connection reset
    at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(SourceFile:248)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
    at java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:307)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
    at java.lang.Thread.run(Thread.java:1012)
Caused by: java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(SocketInputStream.java:191)
    at java.net.SocketInputStream.read(SocketInputStream.java:143)
    at org.conscrypt.ConscryptEngineSocket$SSLInputStream.readFromSocket(Unknown Source:24)
    at org.conscrypt.ConscryptEngineSocket$SSLInputStream.processDataFromSocket(Unknown Source:172)
    at org.conscrypt.ConscryptEngineSocket$SSLInputStream.readUntilDataAvailable(Unknown Source:0)
    at org.conscrypt.ConscryptEngineSocket$SSLInputStream.read(SourceFile:3)
    at org.conscrypt.ConscryptEngineSocket$SSLInputStream.read(SourceFile:1)
    at java.io.DataInputStream.readByte(DataInputStream.java:273)
    at org.eclipse.paho.client.mqttv3.internal.wire.MqttInputStream.readMqttWireMessage(SourceFile:14)
    at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(SourceFile:77)
    ... 6 more

2024-07-27 08:46:05.431 D [MQTT Rec: pixel5pixel5] Scheduler: Scheduled ONETIME_TASK_MQTT_RECONNECT job
2024-07-27 08:46:37.051 I [DefaultDispatcher-worker-4] MQTTReconnectWorker: MQTT reconnect worker job started
2024-07-27 08:46:37.058 D [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$disconnect: MQTT client already not connected
2024-07-27 08:46:37.061 D [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$disconnect: Unregistered ping alarm receiver
2024-07-27 08:46:37.062 D [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$disconnect: MQTT disconnected in 3.748698ms
2024-07-27 08:46:37.075 D [DefaultDispatcher-worker-2] AlarmPingSender: Initializing MQTT keepalive AlarmPingSender
2024-07-27 08:46:38.055 I [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$connectToBroker: Connecting to ssl://a556ae9d9dba41159ebabaf82b32762b.s1.eu.hivemq.cloud:8883?# timeout = 30s
2024-07-27 08:46:39.610 D [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$connectToBroker: Registered ping alarm receiver
2024-07-27 08:46:39.610 I [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$connectToBroker: MQTT Connected. Subscribing to [owntracks/+/+, owntracks/+/+/event, owntracks/+/+/info, owntracks/PIXEL5/PIXEL5/cmd]
2024-07-27 08:46:39.712 I [DefaultDispatcher-worker-2] MQTTMessageProcessorEndpoint$connectToBroker: MQTT Subscribed
2024-07-27 08:46:39.712 D [DefaultDispatcher-worker-2] MessageProcessor: Resetting message send loop wait.
compuguy commented 3 weeks ago

@compuguy Is there any pattern to the periodicity of the disconnects?

The bottom line with a lot of this stuff is that relying on (or expecting) long-term TCP connections to never disconnect is highly optimistic, especially in a mobile context where changes to the connected network interface, the mobile OS, the whims of the other party and the internet all conspire to drop / ignore packets. TCP's particularly susceptible to lost packets, as the default fall-back error recovering mechanism is just to tear down the connection and start again.

I agree. Once a prelease or an actual release of 2.5.2 is cut, I can confirm that this isn't an issue.