hannesa2 / paho.mqtt.android

Kotlin MQTT client for Android
410 stars 94 forks source link

Unexplained websocket disconnection precisely 5 minutes after the connection is established #609

Open James-Stokes opened 4 months ago

James-Stokes commented 4 months ago

Tested using Paho version 4.2.4 on Andorid 11 and Android 7 I have been connecting to a mosquito (v2) broker using the mqtt protocol on port 1883 and everything is fine, but exactly the same setup using web sockets on port 9001 causes a disconnection exactly 5 minutes after connection every time (the auto reconnect then does its thing and the connection is re-established again almost instantly). It seems very easy to reproduce, I am using the default configs for mosquitto and paho and it happens every time. I have tried setting the keepalive interval to both 10 and 100 seconds, I see the PINGREQ and PINGRESP messages in the broker logs and that seems to be working fine. I think that this is an issue with the client and not the broker because the Paho python client does not have the same issue with the same configuration and broker (also using websockets). I have tried bot SSL and plaintext, and brokers hosted in a k8s cluster and in my local testing environment, always a disconnection after 5 minutes.


Client initialisation:
mqttClient = MqttAndroidClient(this, BROKER_URL, clientId).also {
    it.setCallback(DefaultCallback())
}

Client connection:
val options = MqttConnectOptions()
options.isCleanSession = false
options.isAutomaticReconnect = true
options.userName = mqttClient.clientId
options.password = mPassword.toCharArray()
options.maxInflight = MAX_IN_FLIGHT
options.setWill(getScopedTopic(TOPIC_ONLINE), "0".toByteArray(), QOS_LEVEL, false)
Log.i(LOG_TAG, "Connecting as ${options.userName}")
val connectionCallback = object : IMqttActionListener {
    override fun onSuccess(token: IMqttToken?) {
        Log.i(LOG_TAG, "Connection success")
        val disconnectedBufferOptions = DisconnectedBufferOptions()
        disconnectedBufferOptions.isBufferEnabled = true
        disconnectedBufferOptions.bufferSize = BUFFER_SIZE
        disconnectedBufferOptions.isPersistBuffer = true
        disconnectedBufferOptions.isDeleteOldestMessages = true
        mqttClient.setBufferOpts(disconnectedBufferOptions)
    }

    override fun onFailure(token: IMqttToken?, exception: Throwable?) {
        Log.e(LOG_TAG, "Connection failed", exception)
    }
}
mqttClient.connect(options, this, connectionCallback)

Disconnection callback logs in Paho:
Connection lost (32109) - java.io.EOFException 
at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:197)
at java.lang.Thread.run(Thread.java:761) Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:270)
at org.eclipse.paho.client.mqttv3.internal.wire.MqttInputStream.readMqttWireMessage(MqttInputStream.java:92)
at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:137)
at java.lang.Thread.run(Thread.java:761)

Mosquitto broker logs:
1714144958: Sending PUBREC to faba9c974541 (m30, rc0)
1714144959: Received PUBREL from faba9c974541 (Mid: 30)
1714144959: Sending PUBCOMP to faba9c974541 (m30)
1714145019: Received PINGREQ from faba9c974541 
1714145019: Sending PINGRESP to faba9c974541 
1714145041: Client faba9c974541 closed its connection.
hannesa2 commented 3 months ago

This https://github.com/hannesa2/paho.mqtt.android/pull/631 tries to test it

hannesa2 commented 3 months ago

I made an own tab to observer ping behavior https://github.com/hannesa2/paho.mqtt.android/pull/644

and when I try a reconnect of connection lost https://github.com/hannesa2/paho.mqtt.android/pull/646 I see that

  1. ping was first working,
  2. then a ping was canceled (Question 1: why ?)
  3. then a connection lost appears
  4. an automatic retry failed, because
  5. an UnknowHostException appears

I've no explanation. does someone understand what's going on here ?

(PingWorker.kt:19)           info.mqtt.android.extsample D  .doWork(): Sending Ping at: 2024-06-03 19:33:03Z
(AlarmPingSender.kt:46)      info.mqtt.android.extsample D  .schedule(): Schedule next alarm at 2024-06-03 19:36:23Z
(PingWorker.kt:24)           info.mqtt.android.extsample D  $doWork.onSuccess(): Success null
WM-WorkerWrapper             info.mqtt.android.extsample I  Work [ id=636a1c87-c85e-45d4-9129-e1efff1985f1, tags={ info.mqtt.android.service.ping.PingWorker } ] was cancelled (Ask Gemini)
                                                            java.util.concurrent.CancellationException: Task was cancelled.
                                                                at androidx.work.impl.utils.futures.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:1183)
                                                                at androidx.work.impl.utils.futures.AbstractFuture.getDoneValue(AbstractFuture.java:513)
                                                                at androidx.work.impl.utils.futures.AbstractFuture.get(AbstractFuture.java:474)
                                                                at androidx.work.impl.WorkerWrapper$2.run(WorkerWrapper.java:316)
                                                                at androidx.work.impl.utils.SerialExecutorImpl$Task.run(SerialExecutorImpl.java:96)
                                                                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                                                                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
                                                                at java.lang.Thread.run(Thread.java:1012)
(AlarmPingSender.kt:46)      info.mqtt.android.extsample D  .schedule(): Schedule next alarm at 2024-06-03 19:33:55Z
(MqttAndroidClient.kt:884)   info.mqtt.android.extsample V  .onReceive(): Bundle[{.exceptionStack=Connection lost (32109) - java.net.SocketException: Software caused connection abort (Ask Gemini)
                                                                at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:197)
                                                                at java.lang.Thread.run(Thread.java:1012)
                                                            Caused by: java.net.SocketException: Software caused connection abort
                                                                at java.net.SocketInputStream.socketRead0(Native Method)
                                                                at java.net.SocketInputStream.socketRead(SocketInputStream.java:118)
                                                                at java.net.SocketInputStream.read(SocketInputStream.java:173)
                                                                at java.net.SocketInputStream.read(SocketInputStream.java:143)
                                                                at java.net.SocketInputStream.read(SocketInputStream.java:205)
                                                                at java.io.DataInputStream.readByte(DataInputStream.java:273)
                                                                at org.eclipse.paho.client.mqttv3.internal.wire.MqttInputStream.readMqttWireMessage(MqttInputStream.java:92)
                                                                at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:137)
                                                                at java.lang.Thread.run(Thread.java:1012) 
                                                            , .errorMessage=Connection lost, .exception=Connection lost (32109) - java.net.SocketException: Software caused connection abort, .callbackAction=onConnectionLost, .clientHandle=tcp://broker.hivemq.com:1883:ExtendedSampleClient:info.mqtt.android.extsample, .callbackStatus=OK}]
(MqttCallbackHandler.kt:26)  info.mqtt.android.extsample W  .connectionLost(): MqttException Connection lost
(MqttCallbackHandler.kt:37)  info.mqtt.android.extsample I  .connectionLost(): Try to reconnect
TrafficStats                 info.mqtt.android.extsample D  tagSocket(90) with statsTag=0xffffffff, statsUid=-1
(PingWorker.kt:19)           info.mqtt.android.extsample D  .doWork(): Sending Ping at: 2024-06-03 19:33:55Z
WM-WorkerWrapper             info.mqtt.android.extsample I  Worker result FAILURE for Work [ id=ce7563fc-fd26-4c0f-8dff-5d07b8f82832, tags={ info.mqtt.android.service.ping.PingWorker } ]
TrafficStats                 info.mqtt.android.extsample D  tagSocket(90) with statsTag=0xffffffff, statsUid=-1
(MqttAndroidClient.kt:884)   info.mqtt.android.extsample V  .onReceive(): Bundle[{.invocationContext=null, .errorMessage=MqttException, .exception=MqttException (0) - java.net.UnknownHostException: broker.hivemq.com, .activityToken=2, .callbackAction=connect, .clientHandle=tcp://broker.hivemq.com:1883:ExtendedSampleClient:info.mqtt.android.extsample, .callbackStatus=ERROR}]
(ActionListener.kt:100)      info.mqtt.android.extsample E  .onFailure(): CONNECT token=info.mqtt.android.service.MqttTokenAndroid@a2cdbbe (Ask Gemini)
                                                            MqttException (0) - java.net.UnknownHostException: broker.hivemq.com
                                                                at org.eclipse.paho.client.mqttv3.internal.ExceptionHelper.createMqttException(ExceptionHelper.java:38)
                                                                at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:738)
                                                                at java.lang.Thread.run(Thread.java:1012)
                                                            Caused by: java.net.UnknownHostException: broker.hivemq.com
                                                                at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:208)
                                                                at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:436)
                                                                at java.net.Socket.connect(Socket.java:646)
                                                                at org.eclipse.paho.client.mqttv3.internal.TCPNetworkModule.start(TCPNetworkModule.java:74)
                                                                at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:724)
                                                                at java.lang.Thread.run(Thread.java:1012) 
(ActionListener.kt:157)      info.mqtt.android.extsample E  .connect(): Client failed to connect MqttException
mistralaix commented 1 month ago

We are also experiencing the same problem: we are connecting to a mqtt broker through websockets and it is disconnecting after exactly 5 minutes too.

Did someone figured out a solution or temporary fix?

hannesa2 commented 1 month ago

Currently: no, it ends up with a java.net.UnknownHostException which is previous known I've no clue why

FJEagle commented 4 days ago

We are also experiencing the same problem: we are connecting to a mqtt broker through websockets and it is disconnecting after exactly 5 minutes too.

Did someone figured out a solution or temporary fix?

Could it be telecom network heartbeat problem? As I know, some telecom operator would disconnect the long-lived connection at certain duration. To confirm it: try to connect to a server in LAN, would it disappear?