thorrak / tiltbridge

Tilt Hydrometer to WiFi Bridge
http://www.tiltbridge.com/
Other
61 stars 27 forks source link

MQTT client does not always start sending properly #256

Closed fkroepfl closed 3 months ago

fkroepfl commented 4 months ago

It may take a few power cycles until the ESP32 card with the TiltBridge firmware with MQTT starts to transmit completely. Current status after 3 reboots and a waiting time of approx. 30' after the reboot:

image

as mentioned in https://github.com/thorrak/tiltbridge/issues/247#issuecomment-1952533150

thorrak commented 4 months ago

How frequently do you currently have it set to send to MQTT? When you say it isn’t sending, is it not sending at all or is it that only some of the announcements get sent?

fkroepfl commented 4 months ago

not at all sending ... πŸ€·β€β™‚οΈ

thorrak commented 4 months ago

Damn. Alright. Looks like I have a project for this evening then.

fkroepfl commented 4 months ago

In the meantime, I have been able to determine 3 possible states after booting:

a) Tiltbridge is not visible in the network, i.e. cannot be detected with LanScan or Fing => reboot required

b) if visible in the network, the MQTT-Config messages are usually sent, but no payload messages => reboot required

c) visible in the network, MQTT-Config messages are sent and payload messages are then sent after a short delay

if c) is reached, transmission continues indefinitely at the set frequency until interruption. Either the transmission is continued after an undefined pause or the status is held until the restart. this is in relation to #247

thorrak commented 4 months ago

I took some time this weekend to get an MQTT server set up and dig into the firmware a bit. I was able to track down a handful of bugs that impact MQTT initialization & connection handling. Although I know that the bugs I found were squished, I'm not entirely sure that these are the same as this bug and #247.

I've pushed out v1.2.2-beta5 which contains these fixes. If you have a few minutes to give it a shot I'd appreciate the feedback!

fkroepfl commented 4 months ago

I'm currently on the road, which means I don't have direct access to the hardware. After my return at the end of the week, I will be able to try out the new version.

With the previous version, I noticed that one or other of the causes could probably be found in the connection handling. On the one hand, the behavior is independent of the ESP board used, i.e. sometimes the one works with relatively little problem, then the other.

Question about this: How is connection handling organized if, for example, there are WiFi dropouts? Does the transmission then start again automatically or does it remain blocked?

The good news I have is that the ESP board with the OLED display seems to have reached a stable state after a few restarts and has now been running for several days with very few dropouts.

The second ESP board has hung up after a short time, but with different timestamps for red and blue, which speaks more to a line-specific problem, as well as if there are one or other dropouts per color on the other board.

More details when I have access to the devices again πŸ˜‰

thorrak commented 4 months ago

No worries at all - and no rush! Enjoy the trip!

The changes that I made in the most recent version should mean that every step of the MQTT processing checks if the controller is connected to the MQTT server and attempts to reconnect if it is not. Unlike the handling for other connections, the expectation is that the MQTT connection is persistent with reconnect attempts made in the main loop if a loss of connection is detected.

That said, there is no specific handling for a WiFi outage in that code, and I do not know how the MQTT library reacts if the WiFi connection is disconnected when a reconnect attempt is made, so this sounds like a plausible cause of this issue. Adding connection checking is pretty straightforward - I’ll see if I can get that in before you get home.

fkroepfl commented 3 months ago

About TiltBridge: v1.2.2 [no_std_strings] (918a64a)

I now have the previously mentioned version running on the ESP32 board without OLED display.

Coincidence or not, but this time fewer resets were required until the data stream was set. And this does not only mean the MQTT messages, but also the display of the Tilt BLUE, which was not immediately present.

As a comparison, the iPhone with the TILT app as a receiver, which was right next to the ESP32 in the same place, where both TILTs were recognized immediately.

The board has been running error-free for 2 hours since the last reset, which is already longer than the one or other attempt with the other versions before.

image

image

image

fkroepfl commented 3 months ago

@thorrak

Intermediate status at Uptime=91459 (see graphic)

I think you are on the right track. πŸ€—

image

thorrak commented 3 months ago

Glad to hear it! I'm going to leave this open for another day or so, but will close it when I merge #246 if you don't run into any issues.

Thanks for your help debugging all this -- I'm glad we've made progress!

fkroepfl commented 3 months ago

@thorrak

Even if the data stream now looks clean when it starts, I still have the problem that sending does not start reliably.

I have now set up a test setup with an automatic power cycle that is triggered every 5 minutes. (I hope 5 minutes is long enough for all processes to run as planned).

What is strange:

Any ideas?

image image image

thorrak commented 3 months ago

Are you running the latest beta (202037d) or the one before it (918a64a)? The changes in 202037d should have helped with this significantly.

fkroepfl commented 3 months ago

About TiltBridge: v1.2.2 [no_std_strings] (202037d)

I am using the latest version with the same cable set. The only difference is that my MacBook is connected once between the power supply and the USB adapter, then again not. When my MacBook is in between, it works most of the time. Without a MacBook in between, it never really works. In this respect, we can rule out cable defects or an insufficient cable cross-section if we are thinking of a short-term power shortage, for example. In various forums I have found some references to the "Brownout Detector" when ESP32 boards cannot be booted. In various posts it is mentioned that there is quite a high power requirement when Wifi is activated, and some therefore deactivate the "Brownout Detector" in this boot phase.

Unfortunately, this is beyond my scope of knowledge or judgment πŸ˜‰

fkroepfl commented 3 months ago

There seems to be evidence that confirms the influence of the power supply unit:

As a consequence and in accordance with the recommendation, I am now using a Raspberry power supply. Yes, it's better, but not quite good yet πŸ˜‰

What is nevertheless noticeable is that even in this constellation, it can take up to three power cycles for the data from both tilts to be transmitted. In the meantime, only data from one tilt is transmitted and, as I understand it, this cannot be reconciled with a possible power supply problem.

There may be different causes at play here, right?

Pls see screenshots below.

Bildschirmfoto 2024-03-03 um 11 47 31 Bildschirmfoto 2024-03-03 um 11 47 49 Bildschirmfoto 2024-03-03 um 11 48 02
thorrak commented 3 months ago

That's interesting, and unfortunately it's not something I anticipate being able to resolve with software. That said, the one thing I can try is to add a delay between initializing the LCD and WiFi to hopefully give any capacitors on board time to recharge.

I've added this delay to v1.2.2-beta7 which is now available on BrewFlasher. Let me know if that helps at all and I'll get it merged in.

fkroepfl commented 3 months ago

So it feels like it's become less reliable. I've now had eight attempts on the computer USB port to be able to configure the TiltBridge. And so far the computer USB port has made the least problems, i.e. it has just required the odd reboot.

But let's wait and see what the test series with the 5' cycles brings, which I have now started again.

thorrak commented 3 months ago

Unfortunately, from research on my side, it seems like the only real solutions are to either improve the cable quality/decrease the length, improve the power supply quality, or swap out the ESP32 module for one with better power management.

With that in mind, I think that makes the beta6 release the one that will become the next full release of TiltBridge. I'll get that promoted shortly.

fkroepfl commented 3 months ago

With that in mind, I think that makes the beta6 release the one that will become the next full release of TiltBridge. I'll get that promoted shortly.

Yes, this version is actually more reliable in my current environment than the last version. πŸ€—