Open kervel opened 2 years ago
Hello, unfortunately, in the mean time, i also managed to reproduce the problem using mqttv5. Seems stuck here:
│ "MQTT Call: dm_source_id" #45 prio=5 os_prio=0 cpu=6.39ms elapsed=151.10s tid=0x00007faa94017000 nid=0x42 in Object.wait() [0x00007faabdd96000] │
│ java.lang.Thread.State: WAITING (on object monitor) │
│ at java.lang.Object.wait(java.base@11.0.11/Native Method) │
│ - waiting on <0x00000000b7a6de10> (a java.lang.Object) │
│ at java.lang.Object.wait(java.base@11.0.11/Object.java:328) │
│ at org.eclipse.paho.mqttv5.client.internal.Token.waitForResponse(Token.java:177) │
│ - waiting to re-lock in wait() <0x00000000b7a6de10> (a java.lang.Object) │
│ at org.eclipse.paho.mqttv5.client.internal.Token.waitForCompletion(Token.java:130) │
│ at org.eclipse.paho.mqttv5.client.MqttToken.waitForCompletion(MqttToken.java:76) │
│ at org.eclipse.paho.mqttv5.client.MqttClient.subscribe(MqttClient.java:530) │
│ at org.eclipse.paho.mqttv5.client.MqttClient.subscribe(MqttClient.java:510) │
│ at org.eclipse.paho.mqttv5.client.MqttClient.subscribe(MqttClient.java:503) │
│ at com.datamountaineer.streamreactor.connect.mqtt.source.MqttManager.connectComplete(MqttManager.scala:163) │
│ at org.eclipse.paho.mqttv5.client.internal.ConnectActionListener.onSuccess(ConnectActionListener.java:175) │
│ at org.eclipse.paho.mqttv5.client.internal.CommsCallback.fireActionEvent(CommsCallback.java:358) │
│ at org.eclipse.paho.mqttv5.client.internal.CommsCallback.handleActionComplete(CommsCallback.java:285) │
│ - locked <0x00000000b784b4a8> (a org.eclipse.paho.mqttv5.client.MqttToken)
And ... i think i found something that works reliably. I now do:
client.setTimeToWait(1000)
before connecting the client. This way, the .subscribe() call never blocks more than 1 sec. since the subscribe works (it always works, its just paho that doesn't see it) even if it times out it continues fine.
The issue seems to still exist in version 6.0.0.
client.setTimeToWait
will throw an MqttException
on time out.
Could we check if there is a reconnect with clean session = false, and not subscribe in this case? client.subscribe
will timeout and throw on the first connection, but proceed on reconnection.
--- a/kafka-connect-mqtt/src/main/scala/io/lenses/streamreactor/connect/mqtt/source/MqttManager.scala
+++ b/kafka-connect-mqtt/src/main/scala/io/lenses/streamreactor/connect/mqtt/source/MqttManager.scala
@@ -50,6 +50,7 @@ class MqttManager(
client.setCallback(this)
logger.info(s"Connecting to ${settings.connection}")
+ client.setTimeToWait(5000)
client.connect(options)
logger.info(s"Connected to ${settings.connection} as ${settings.clientId}")
@@ -165,9 +166,10 @@ class MqttManager(
val topic = sourceToTopicMap.keySet.toArray
val qos = Array.fill(sourceToTopicMap.keySet.size)(settings.mqttQualityOfService)
- if (reconnect)
+ if (reconnect && !options.isCleanSession())
logger.warn(s"Reconnected. Resubscribing to topic $topic...")
- client.subscribe(topic, qos)
+ else client.subscribe(topic, qos)
+
if (reconnect)
logger.warn(s"Resubscribed to topic $topic with QoS $qos")
else logger.info(s"Subscribed to topic $topic with QoS $qos")
Is there a cleaner method?
I could solve the issue by wrapping client.subscribe
and the logging that follows in a Future to avoid blocking the connectComplete
callback.
--- a/kafka-connect-mqtt/src/main/scala/io/lenses/streamreactor/connect/mqtt/source/MqttManager.scala
+++ b/kafka-connect-mqtt/src/main/scala/io/lenses/streamreactor/connect/mqtt/source/MqttManager.scala
@@ -28,6 +28,9 @@ import java.util
import java.util.Base64
import java.util.concurrent.LinkedBlockingQueue
import java.util.concurrent.TimeUnit
+
+import scala.concurrent.ExecutionContext.Implicits.global
+import scala.concurrent.Future
import scala.jdk.CollectionConverters.ListHasAsScala
class MqttManager(
@@ -167,9 +170,14 @@ class MqttManager(
if (reconnect)
logger.warn(s"Reconnected. Resubscribing to topic $topic...")
- client.subscribe(topic, qos)
- if (reconnect)
- logger.warn(s"Resubscribed to topic $topic with QoS $qos")
- else logger.info(s"Subscribed to topic $topic with QoS $qos")
+
+ Future {
+ client.subscribe(topic, qos)
+ if (reconnect)
+ logger.warn(s"Resubscribed to topic $topic with QoS $qos")
+ else logger.info(s"Subscribed to topic $topic with QoS $qos")
+ }
+
+ return
}
}
I believe we're running into this very issue: With connect.mqtt.clean=false the connector occasionally stops working and does not recover. While Resubscribing is logged, Resubscribed is not. According to broker logs messages are being sent.
What version of the Stream Reactor are you reporting this issue for?
3.0.1
Are you running the correct version of Kafka/Confluent for the Stream reactor release?
running on v2.7 (i adapted build.gradle to match my kafka version)
What is the expected behaviour?
What was observed?
I found various similar (but not exactly the same) bug reports logged against paho mqttv3. So i modified lenses to use mqttv5 instead, and i cannot reproduce the issue anymore.
The modifications are rather trivial apart from clean session now being called clean start, but i can share them if needed.
What is your connector properties configuration (my-connector.properties)?
mosquitto config
Please provide full log files (redact and sensitive information)
this is on the first start:
this is after a restart:
here, the last log message (subscribed) is missing because it never reaches that line of code. note that i added "okay starting subscriptions" print statement at the beginning of the connectionComplete function.