Open sagIoTPower opened 2 weeks ago
Hi,
we set max_queued_messages
and max_queued_bytes
to 0 ( no limitation).
With the Mosquitto Bridge, persistence worked perfectly. After about a quarter of an hour and about 300 KB of persisted data, we stopped the test.
With the Rust Bridge, persistence stopped after approx. 5 minutes and approx. 215 KB of persisted data. At the same time, we received the following output in the MQTT Broker log
Client tedge-mapper-bridge-c8y has exceeded timeout, disconnecting.
The persisted messages are not rolling, i.e. when reconnecting, only the data directly after the disconnect is sent and not the data shortly before the reconnect.
There are two independent issues here:
Outgoing messages are being dropped for client tedge-mapper-bridge-c8y
.max_queued_messages
and max_queued_bytes
set to 0
(no limit).max_queued_messages 0
fixes the main issueClient tedge-mapper-bridge-c8y has exceeded timeout, disconnecting.
is due to the builtin bridge failing to timely send a keep alive message.Using this script to publish easy-to-check data, I checked the following:
max_queued_messages
and max_queued_bytes
are reached, then no messages is lost.max_queued_messages
or max_queued_bytes
is reached,
then one can observe in mosquitto log that Outgoing messages are being dropped for client tedge-mapper-bridge-c8y
and, when the bridge is re-established, all the messages sent between the two events are lost.max_queued_messages
and max_queued_bytes
to 0
(i.e. no limit),
then the messages are properly queued when the connection is lost
and properly published to the cloud when the connection is back.
I did a first experiment with an hour disconnection then a second 10 hours long.
In both case no error messages have been observed (neither on mosquitto nor on the builtin bridge).
In the 1-hour case, all the messages have actually been received on the cloud.
For the 10-hours case, things are a bit more difficult to assess as C8Y is aggregating older messages.
The aggregated values indicates that the messages have been received for the whole period,
* The [second comment ](https://github.com/thin-edge/thin-edge.io/issues/3083#issuecomment-2309913209) is about something different. * The error [`Client tedge-mapper-bridge-c8y has exceeded timeout, disconnecting.`](https://github.com/eclipse/mosquitto/issues/2124#issuecomment-794452620) is due to the builtin bridge failing to timely send a keep alive message. * => We have to address this. If the builtin bridge fails for some to respond on-time and is disconnected by mosquitto, it should be able to properly reconnect to the local broker _even_ if disconnected a that time from the cloud.
I guess this means we need the bridge to process forwarding messages in a separate task, so we can poll the event loop in the meantime? If I understand correctly, mosquitto will only send up to 100 messages (or however many we configure in practice) before it receives an ack, in which case this should be reasonably possible to implement.
Alternative solution, can we just increase the channel size to be greater than the maximum number of in-flight messages, so we're never blocked publishing an event? That would be a really simple solution if it works, and may mean we can avoid tokio::spawn
when calling subscribe too.
Hi,
we have the behaviour that mosquitto stops to persist messages when the connection tedge_mqtt_bridge to the cloud is interrupted: (see screenshot tedge_mqtt_bridge_disconnect):
2024-08-23T06:42:30.112741403Z ERROR tedge_mqtt_bridge::health: MQTT bridge failed to connect to cloud broker: Mqtt state: Last pingreg isn't acked
The persistence-store continues to grow, as expected, until:
Time 6:48:35 AM 08/23/2024
.See screenshots and
But then is stops to grow. Which is strange.
We use the rust bridge and thin edge 1.2
This error does not happen, when we switch to the mosquitto bridge. Is it a bug or do we have to configure something?
When the connection is finally restored the before persisted messages are successfully transmitted. But these are the old messages, messages received after persistence_store_grow_end discarded.