thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
219 stars 54 forks source link

mosquitto stops to persisting messages when rust bridge disconnects #3083

Open sagIoTPower opened 2 weeks ago

sagIoTPower commented 2 weeks ago

Hi,

we have the behaviour that mosquitto stops to persist messages when the connection tedge_mqtt_bridge to the cloud is interrupted: (see screenshot tedge_mqtt_bridge_disconnect):

2024-08-23T06:42:30.112741403Z ERROR tedge_mqtt_bridge::health: MQTT bridge failed to connect to cloud broker: Mqtt state: Last pingreg isn't acked

The persistence-store continues to grow, as expected, until: Time 6:48:35 AM 08/23/2024.

See screenshots persistence_store_grow_start_github and persistence_store_grow_end_github

But then is stops to grow. Which is strange.

We use the rust bridge and thin edge 1.2

This error does not happen, when we switch to the mosquitto bridge. Is it a bug or do we have to configure something?

When the connection is finally restored the before persisted messages are successfully transmitted. But these are the old messages, messages received after persistence_store_grow_end discarded.

sagIoTPower commented 2 weeks ago

Hi,

we set max_queued_messages and max_queued_bytes to 0 ( no limitation).

With the Mosquitto Bridge, persistence worked perfectly. After about a quarter of an hour and about 300 KB of persisted data, we stopped the test.

Bild

With the Rust Bridge, persistence stopped after approx. 5 minutes and approx. 215 KB of persisted data. At the same time, we received the following output in the MQTT Broker log

Client tedge-mapper-bridge-c8y has exceeded timeout, disconnecting.

The persisted messages are not rolling, i.e. when reconnecting, only the data directly after the disconnect is sent and not the data shortly before the reconnect.

didier-wenzek commented 2 weeks ago

There are two independent issues here:


Using this script to publish easy-to-check data, I checked the following:

jarhodes314 commented 2 weeks ago
* The [second comment ](https://github.com/thin-edge/thin-edge.io/issues/3083#issuecomment-2309913209) is about something different.

  * The error [`Client tedge-mapper-bridge-c8y has exceeded timeout, disconnecting.`](https://github.com/eclipse/mosquitto/issues/2124#issuecomment-794452620)  is due to the builtin bridge failing to timely send a keep alive message.
  * => We have to address this. If the builtin bridge fails for some to respond on-time and is disconnected by mosquitto,
    it should be able to properly reconnect to the local broker _even_ if disconnected a that time from the cloud.

I guess this means we need the bridge to process forwarding messages in a separate task, so we can poll the event loop in the meantime? If I understand correctly, mosquitto will only send up to 100 messages (or however many we configure in practice) before it receives an ack, in which case this should be reasonably possible to implement.

Alternative solution, can we just increase the channel size to be greater than the maximum number of in-flight messages, so we're never blocked publishing an event? That would be a really simple solution if it works, and may mean we can avoid tokio::spawn when calling subscribe too.