lumapu / ahoy

Various tools, examples, and documentation for communicating with Hoymiles microinverters
https://ahoydtu.de
Other
953 stars 224 forks source link

[Bug] wifi connection very unstable since 0.6.9 on esp8266 #901

Closed sstidl closed 8 months ago

sstidl commented 1 year ago

Platform

ESP8266

Assembly

I did the assebly by myself

nRF24L01+ Module

No response

Antenna

circuit board

Power Stabilization

Elko (~100uF)

Connection picture

Version

0.6.9

Github Hash

230419_ahoy_0.6.9_15ec6a0_esp8266.bin

Build & Flash Method

AhoyDTU Webinstaller

Setup

Nothing special. No display.

Debug Serial Log output

No response

Error description

Had stable 0.6.0 experience. Updated to 0.6.9 WiFi connection unstable Ping test shows 5-6 pings after reboot then 20 lost pings, then reboot, then some pings and so on

When I touch the antenna or the cover of the esp the ping rate is stable. This way I was able to downgrade to 0.6.0

Very strange behavior.

dortmund50 commented 1 year ago

I have exactly the same problem. The WIFI connection is very bad with the new version, every few seconds there is no connection.

tastendruecker123 commented 1 year ago

Are you using MQTT? Is it more stable with the MQTT host left empty?

nexulm commented 1 year ago

Oops, for me the ESP8266 is dead now. PowerOn reset doesn't solve/improve the issue. No ping reply for the last minutes after upgrading from 0.6.0 to 0.6.9. :-( Seems, that I'll downgrade to 0.6.0 by USB connetion again and keep on this stable version for the next months. ;-)

tastendruecker123 commented 1 year ago

Are you using MQTT?

nexulm commented 1 year ago

Yes, MQTT is the most important feature for me. ;-)

sstidl commented 1 year ago

I'm too using mqtt.

sstidl commented 1 year ago

Mqtt is sending data but webui is not usable and so many pings lost.

But as I said, the most important question is, why touching the antenna makes a difference?

tastendruecker123 commented 1 year ago

Does the ping situation improve when you set an empty MQTT host? I'm trying to find out whether the problem is related to the MQTT code. I don't know why touching the antenna makes a difference.

nexulm commented 1 year ago

After deleting the MQTT host settings I didn't observe any WiFi/ping issue "Zeitüberschreitung" (timeout) are the missing replies during restart (flash process 0.6.0 => 0.6.9). image

Strange, after configuring the MQTT part with the same values Ahoy is reacting as aspected and I can access the UI. :-) Please keep in mind that I didn't have any issue by touching the antenna as my ESP8266 is in a housing. ;-)

onemorename commented 1 year ago

Same here, problem disappears when disabling MQTT and DTU stays working fine when re-activating MQTT then. Still monitoring, haven't found out yet 100% what's going on. Hope we learn more here soon as this is a showstopper

tastendruecker123 commented 1 year ago

At this point I'm not sure why it's behaving that way, but I have seen a few reports linking it to MQTT, which is why I asked about diabling it. Your results seem to confirm that it has to be something to do with the MQTT code. Glancing over the commits leading up to 0.6.9 I only found something related to an MQTT subscribe action.

If I were to guess, I'd say the MQTT code gets stuck in a loop somehow and eats up most of the CPU time.

onemorename commented 1 year ago

Did some further testing here as well after reporting issues with WiFi on #882 Thread during Dev already Now, did move the DTU closer to the HM's (5m) and changed sending power from Max to Low. TX retransmits are still very high / same (3985 retreansmitts while 4514 TX count) so this is much worse than with 0.6.0 and was also noticed in #906 Also, did move a WiFi repeater closer to the DTU so the DTU has now -43 RSSI. Voila the DTU is now running rock solid for 24 hours already sending Data via MQTT and handling limit changes via MQTT on the fly with 0.6.9 Based on the comments from @lumapu and @beegee3 in #882 my assumption is, the ESP is busy with retransmits on the NRF and doesn't have much time anymore to handle Wifi. Now, if Wifi isn't in a 'perfect' state requiring retransmits as well, the ESP is running wild as it's unable to handle buffered data for MQTT anymore. Still running MQTT and still running with 5 Inverters so the setup hasn't been changed on that side. So, the underlying reason for the 'hickups' now could really be something that has changed in the NRF24 space clogging everything else. To be confirmed by the experts, will let you know if anything changes here

ncqgm commented 1 year ago

zwei logs (minicom -D /dev/ttyUSB0), die das instabile Wifi zeigen:

die DTU liegt immer an derselben Stelle, rssi ist zwischen -40 und -60, mal steht die Verbindung über Stunden, mal kommt keine zustande

mir fiel auch auf, daß ein AP meistens als SSID AhoyDTU aufgebaut wird, manchmal aber auch als ESP_ (konfiguriert ist der esp32 aber für die Verbindung in mein WLAN)

ahoy.log ahoydtu.log

Vielleicht hilft das ja irgendwie.

derTillus commented 1 year ago

Ich würde euch auch gerne mit meinen Beobachtungen unterstützen, denn ich habe das gleiche Problem. ESP32, ext. Antenne, Ahoy-DTU steht ca. 1,5m von der Fritzbox entfernt, ca. 4m vom Inverter. Verbindungsabbrüche zu MQTT, WebUI und Inverter. Verhalten trat sporadisch mit 0.5.66 auf, mit 0.6.12 ist keine beständige Kommunikation mehr möglich. Mein AhoyDTU hängt an einer Schaltsteckdose. Es fällt auf, dass der normale Verbrauch von 0,7-0,8W sehr konstant ist, wenn der Ahoy die Verbindung aufgebaut hat. Bei Verbindungsabbrüchen erhöht sich der Verbrauch auf 1,3-1,4W, ich vermute er geht auf Volllast.

Das AhoyDTU Projekt ist echt super, vielen Dank für die tolle Arbeit. Ich hoffe meine Beobachtungen helfen bei der Fehlersuche. IMG_1104 IMG_5041 IMG_1098

onemorename commented 1 year ago

Nimm die Leistung vom NRF mal zurück auf Low, und stell das Polling Intervall der Inverter auf 5 - das hat bei mir das Problem gelöst. Ahoy ist seit 4 Tagen up und running ohne Aussetzer mit MQTT etc

derTillus commented 1 year ago

Sehr cooler Hinweis, vielen Dank.
Das hat tatsächlich sofort einen Unterschied gemacht. MQTT ist jetzt ohne Unterbrechung verbunden.

derTillus commented 1 year ago

Bis heute früh hat es stabil funktioniert. Ohne irgendwelche Änderungen tritt das Verbindungsproblem wieder auf. Zum Teil waren mehrere Versuche nötig, bis der AHOY-DTU wieder gestartet ist. Hat jemand eine Idee woran das liegen könnte?

nexulm commented 1 year ago

Ich habe für mich leider aus den Instabilitäten die Konsequenz des Downgrades ziehen müssen. Seit Dienstag nutze ich wieder sehr zufrieden die 0.6.0 die zuvor bereits >4 Wochen ohne Aussetzer und Auffälligkeiten lief.

derTillus commented 1 year ago

@nexulm: läuft die 0.6.0 tatsächlich stabiler? Hast Du auch eine Steuerung des Limits über MQTT aktiv? Die 0.6.12 hat es jetzt 5 Tage geschafft und jetzt ist das Problem wieder da. Als ob ein Buffer voll laufen würde…

nexulm commented 1 year ago

@derTillus: Wie geschrieben lief die 0.6.0 bei mir über Wochen (>4) stabil ohne Aussetzer und Eingriffe bis zum Update auf 0.6.9. Diese lief bei mir auch unter Beachtung einiger Tipps aus diesem Ticket nie stabil über eine längere Zeit (<=2 Tage). Da ich nun zur weiteren Optimierung meines WLAN-Funkqualität einen WiFi-Router in der Nähe der Ahoy-DTU verwende habe ich gestern auf 0.6.12 aktualisiert. Hintergrund: Der WiFi-Router wird immer über Nacht ausgeschaltet, da die Ahoy-DTU bei dann eh keine Daten vom Wechselrichter empfängt und sendet. Gestern morgen hat die DTU mit 0.6.0 allerdings keinen WLAN-Reconnect geschafft, sodass ein PowerOn Reset durchgeführt werden musste. Mit der 0.6.12 hat es heute morgen dann funktioniert. Mal sehen ob es mir in 5 Tagem dann ähnlich ergeht wie dir mit der 0.6.12!?!

UND: Nein ich habe keine Limit-Steuerung über MQTT. Die PV soll mir alles liefern was geht. ;-)

kiu77 commented 1 year ago

Meine Beobachtung: Auf mehrere "jungfräuliche" ESP32 und ESP8266 0.6.9 aufgespielt und die funktionierten auf Anhieb und stabil. Einen ESP32 von 0.6.0 auf 0.6.9 upgedated und der spielte komplett verrückt. Auch ein neues Aufspielen per Kabel half nicht. Mit vorherigem Löschen und vorherigem Reset auf Werkseinstellungen ging es dann mit 0.6.9 per Kabel. Bei diesem hatte ich mit MQTT herumgespielt.

Vermutung: Es bleiben irgendwo beim Update ein paar Bytes im Flash, die dann vom Update genutzt werden und wenn diese (mit MQTT-Einstellungen?) vorher beschrieben waren, gibt es Probleme beim Update.

400g-Hammer commented 1 year ago

Hello,

first of all, I want to express my honest respect for that work. Please don't rate this as complain but as field report or bug report. Hardware: Wemos D1 mini and NRF24L01+ PA , later enhanced with 1.3" OLED.

I've started some weeks ago with 0.5.66 (I suppose, it was first half of April), installed via Web-Installer. No display back then, run rock solid. No connection issues noticed.

A few days back I added the OLED and therefore went to 0.6.9. Here the issues started.

Misc/Trivia

Let me know, if I you need logfiles (and pls. tell me, how to access them, while connection is unreliable). Otherwise I would downgrade now to 0.6.0 and observe.

Bobbimus commented 1 year ago

Hi habe das selbe Problem denke ich. Die DTU läuft den tag über gut irgendwann fängt sie jedoch an nicht mehr mit dem WR zu kommunizieren und sendet über mqtt nurnoch den Status (0) den Tagesyield, gesamt yield, ip, etc aber keine Leistungsdaten mehr obwohl die Anlage noch produziert... nach einem reboot der DTU geht alles wieder.

HeadCrash66 commented 1 year ago

Gleiches Problem, wenn MQTT dann kein Ping und kein WebUI. Trifft v0.6.9 und v.0.7.2. Es werden aber Daten per MQTT abgeliefert, das seh ich in openHAB.

400g-Hammer commented 1 year ago

Since my last post (https://github.com/lumapu/ahoy/issues/901#issuecomment-1549262622) I run 0.6.9 without any entries for MQTT. Much more stable, over days no problem. Nevertheless, at least once (indeed after several days of runtime) I had to perform a cold reset.

onemorename commented 1 year ago

Mine runs perfectly now with 0.6.9 incl. MQTT and adjusting the limits on all 5 inverters during the day

image

Bobbimus commented 1 year ago

Mine runs perfectly now with 0.6.9 incl. MQTT and adjusting the limits on all 5 inverters during the day

image

How often du you send via mqtt? And did you change any pins? Is the nrf power on low? And last esp8266 or esp32? Sry many questions...

onemorename commented 1 year ago

MQTT Interval 0 (whenever there is a change in the values) No Pins changed NRF Power is Min ESP32 Please see my comments above, where i had the same issues like you before. My game changer was to set the polling interval of the Inverters to 10s and NRF Power to Low. Since then i'm fine!

Bobbimus commented 1 year ago

MQTT Interval 0 (whenever there is a change in the values) No Pins changed NRF Power is Min ESP32 Please see my comments above, where i had the same issues like you before. My game changer was to set the polling interval of the Inverters to 10s and NRF Power to Low. Since then i'm fine!

Okay thanks i am going to try! Currently polling os at 30 s and i am running an esp8266...

m-kloeckner commented 1 year ago

I had the same issue with 0.6.9. Very unstable on a Wemos D1 Mini esp8266, no pings, when MQTT was enabled. It ran fine when disabling MQTT.

I flashed 0.7.3 this morning and it ran fine the whole day with MQTT enabled. So some change in 0.7.3 seems to fix the issue.

Might be in commit https://github.com/lumapu/ahoy/commit/4e54bcf2994fe3ccfccfd9dab15e935bb7337bdf (fix MqTT publishing only updated values https://github.com/lumapu/ahoy/issues/982).

HeadCrash66 commented 1 year ago

I tried 0.7.6 and the problem is still there. When I disconnect the NRF from my NodeMCU it is reachable via Ping/WebUI but with NRF it stucks.

0.6.0 runes fine, also the Ping seams to be lower then with 0.7.6.

ldrolez commented 1 year ago

With my esp8266 + 3 inverters, mqtt now works with 0.7.5

stefanstidlffg commented 1 year ago

Sorry for not writing so long. I tried to go to 0.7.22 via web update on my esp8266 today and lost communications again. It's the same thing as I described in the initial post. I tried to disable mqtt by deleting the IP of the server but it didn't change anything. Tried to downgrade to stable 0.6.9. Now it's dead. Need to flash via serial tomorrow.

LG Stefan

sstidl commented 1 year ago

ok, it looks good now... what i did was: using esptool.py (https://github.com/espressif/esptool) I erased the esp: python3 esptool.py erase_flash (you have to put it in bootloader mode by holding buttons on the esp board)

then flash it python3 esptool.py write_flash 0x0 230804_ahoy_0.7.23_3a944d1_esp8266.bin

look what it does with screen /dev/ttyUSB0 115200 (exit screen with crtl-a k)

connect to access point AHOY_DTU password esp_8266

reconfigure it

now let's see if it's running stable

sstidl commented 1 year ago

I can give a short heads up: Watchdog and exception reboots Had serial logger attached, logs didn't show any useful information

Upgraded to 0.7.26 in the evening (no sun) Today in the morning device didn't ping anymore, had to disconnect power

Reboot because of Hardware watchdog at about 15:30

So no, it's not stable on my esp8266

@lumapu can I get you any info? I have logged heap fragmentation and all other things mqtt gets in influxdb

lumapu commented 1 year ago

To dig deeper into the problem please answer the following questions:

sstidl commented 1 year ago

To dig deeper into the problem please answer the following questions:

  • do you have an capacitor right next to the NRF module? +Yes. As close as possible

  • do you soldered or pinned the connections? +Soldered

  • how is the setting of you power-level? +Min

  • which interval do you set? +30s

  • do you use power-limit control? +No

  • have you connected and configured a display? +No

ldrolez commented 1 year ago

Upgraded to 0.7.26 in the evening (no sun) Today in the morning device didn't ping anymore, had to disconnect power Could you check 0.7.5 ? It's the most stable version for me on the ESP8266

sstidl commented 1 year ago

Now with 0.7.5: flashed at 23:50 worked until 6:30 now serial says:

I: MQTT disconnected, reason: TCP disconnect
I: MQTT disconnected, reason: TCP disconnect
I: MQTT disconnected, reason: TCP disconnect
I: MQTT disconnected, reason: TCP disconnect
I: MQTT disconnected, reason: TCP disconnect
I: (#0) enqueued cmd failed/timeout
I: (#0) resetPayload
I: (#0) Requesting Inv SN xxxxxxxxxx
I: (#0) enqueCommand: 0x0B
I: (#0) prepareDevInformCmd 0x0B
15 pid: 80
I: TX 27B Ch3 | xx xx xx xx xx xx 14 68 33 80 0B 00 64 E3 0B BF 00 00 00 01 00 00 00 00 10 B0 C3
I: (#0) nothing received
I: MQTT disconnected, reason: TCP disconnect
I: MQTT disconnected, reason: TCP disconnect
I: MQTT disconnected, reason: TCP disconnect
I: MQTT disconnected, reason: TCP disconnect
I: MQTT disconnected, reason: TCP disconnect
I: (#0) enqueued cmd failed/timeout
I: (#0) resetPayload
I: (#0) Requesting Inv SN 114181805153
I: (#0) enqueCommand: 0x0B
I: (#0) prepareDevInformCmd 0x0B
15 pid: 80
I: TX 27B Ch23 | xx xx xx xx xx xx 14 68 33 80 0B 00 64 E3 0B DD 00 00 00 01 00 00 00 00 72 01 72 
I: (#0) nothing received
I: MQTT disconnected, reason: TCP disconnect
I: MQTT disconnected, reason: TCP disconnect
I: MQTT disconnected, reason: TCP disconnect
I: MQTT disconnected, reason: TCP disconnect
I: MQTT disconnected, reason: TCP disconnect

so maybe it lost wifi but it didnt try to reconnect. i moved it without rebooting near the router, but no reconnect.

mqtt server is up, wifi is up, checked that.

sstidl commented 1 year ago

also it doesnt get a response from the inverter anymore (when i'm reading the log right).

after pressing reset button, every thing works again. inverter responds immediately

lumapu commented 1 year ago

you're still in a ESP8266? Have you also tried the most recent development versions? How behave the DTU if MqTT isn't configured?

sstidl commented 1 year ago

Yes, still on esp8266. I resoldered some connections to the nrf24, so now communication to the inverter is back.

But: Screenshot_20230823_074028_Home Assistant This is how uptime looks like.with version 0.7.5

At least it doesn't get stuck like with the stable version.0.7.26

Heap and Rssi Screenshot_20230823_074705_Home Assistant

sstidl commented 1 year ago

Disabled mqtt, now up with no issues.

Is MQTT improved in newer versions?

sstidl commented 1 year ago

Now since 5 days with 0.7.40 Hardware watchdog reboots it several times during the day. Sometimes TCP Stack/ wifi connection gets stuck, no pings anymore. I ticked the "reboot at midnight" option so it comes back at midnight.

Wouldn't it be possible to reboot on lost WLAN or lost MQTT?

sstidl commented 1 year ago

I am back to version 0.6.0 now which is stable since a week now.

Is there any chance to get a stable 0.7 version for esp8266?

lumapu commented 1 year ago

my ESP8266 runs stable with each version - currently I'm on 0.7.50. MqTT is enabled and one inverter is registered.

sstidl commented 1 year ago

This is great for you but is not helping... My esp8266 runs now straight for a week without any issues on version 0.6.0 Since 0.6.9 something must have changed that's making it unstable. Maybe your WiFi or MQTT connection is more stable than mine but it shouldn't be a problem for a dtu.

lumapu commented 1 year ago

can you check once you upgrade Ahoy that the heap-fragmentation is low (0-10)? For me after upgrading with OTA the heap is around 25. Then I need to reboot the ESP again using the reboot button in WebUI. After that I have a heap-fragmentation around 2. Hope that helps better than my last answer. Very interesting is as well to have a more stable hardware as base, check #1083 for that. My ESP8266 is driven by an external DCDC (5V to 3.3V) power supply and not the on-board-regulator.

sstidl commented 11 months ago

I am using the current stable now: GIT SHA: ba218ed :: 0.7.36

I checked heap free and fraq Fraq is low as 2 most of the time Free is 17.000

It does resets 2-3 times a day. This wouldn't be a problem as it comes up most of the time. But sometimes it stops working with no reset. Then it doesn't ping anymore or send MQTT messages.

Now I tried to improve WLAN signal. It was -73dB before. Now it's around -60dBm.

Maybe that gives us a hint.

So long S

sstidl commented 11 months ago

Okay, dtu is offline since 16:15. No ping. Does it have to do with sunset? Tomorrow I'll try to disable night mode at sunset.