lumapu / ahoy

Various tools, examples, and documentation for communicating with Hoymiles microinverters
https://ahoydtu.de
Other
953 stars 224 forks source link

Memory leak - Out of Memory Exception #644

Closed Argafal closed 1 year ago

Argafal commented 1 year ago

Exception recorded in 0.5.78:

Unhandled C++ exception: OOM

last failed alloc caller: 0x40223f2e

Decoding stack results
0x40242de8: tcp_input at core/tcp_in.c line 943
0x40248091: ip4_input at core/ipv4/ip4.c line 1467
0x4023fb9d: mem_malloc at core/mem.c line 210
0x4023f261: ethernet_input_LWIP2 at netif/ethernet.c line 188
0x4023f070: esp2glue_ethernet_input at glue-lwip/lwip-git.c line 118
0x40267bb9: ethernet_input at glue-esp/lwip-esp.c line 365
0x40267bcb: ethernet_input at glue-esp/lwip-esp.c line 373
0x402392be: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 720
0x40239282: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 708
0x4023aa76: __d2b at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 779
0x40239569: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 853
0x4023ad4c: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x40235861: __cvt at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_float.c line 102
0x4023ad4c: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x4023ac88: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 182
0x402363a1: _printf_i at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c line 194
0x40235da5: _printf_float at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_float.c line 330
0x40236400: _printf_i at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c line 209
0x4023ac88: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 182
0x4023b188: _svfprintf_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 658
0x4023ec5d: glue2esp_linkoutput at glue-esp/lwip-esp.c line 301
0x4023ee8b: new_linkoutput at glue-lwip/lwip-git.c line 272
0x4023f2ee: ethernet_output at netif/ethernet.c line 312
0x4024686c: etharp_output_to_arp_index at core/ipv4/etharp.c line 769
0x40246940: etharp_output_LWIP2 at core/ipv4/etharp.c line 885
0x402482b0: ip4_output_if_opt_src at core/ipv4/ip4.c line 1764
0x4023fb9d: mem_malloc at core/mem.c line 210
0x4024025e: pbuf_alloc_LWIP2 at core/pbuf.c line 284
Argafal commented 1 year ago

OOM within two minutes upon boot. Running 0.5.88 on ESP8266.

4:54:14.757 > I: (#0) Requesting Inv SN 1141XXX9
14:54:14.763 > I: (#0) prepareDevInformCmd
14:54:14.763 > I: TX 27B Ch3 | 15 72 22 17 79 86 99 51 75 80 0B 00 63 EB 92 86 00 00 00 00 00 00 00 00 78 BA C5
14:54:14.888 > I: RX 27B Ch61 | 95 72 22 17 79 72 22 17 79 01 00 01 01 65 00 27 00 8B 01 5E 00 26 00 84 00 04 A4
14:54:14.896 > I: RX 27B Ch23 | 95 72 22 17 79 72 22 17 79 02 B2 B3 00 04 93 E1 02 07 01 63 09 43 13 85 01 03 59
14:54:14.905 > I: RX 23B Ch23 | 95 72 22 17 79 72 22 17 79 83 00 00 00 0B 03 E8 00 C6 00 34 10 E2 F6
14:54:14.913 > I: procPyld: cmd:  0xb
14:54:14.913 > I: procPyld: txid: 0x95
14:54:14.916 > I: Payload (42): 00 01 01 65 00 27 00 8B 01 5E 00 26 00 84 00 04 B2 B3 00 04 93 E1 02 07 01 63 09 43 13 85 01 03 00 00 00 0B 03 E8 00 C6 00 34
14:54:14.930 > I: alarm ID incremented to 52
14:54:14.932 > I: (#0) enqueuedCmd: 0x11
14:54:20.180 >
14:54:20.181 > User exception (panic/abort/assert)
14:54:20.183 > --------------- CUT HERE FOR EXCEPTION DECODER ---------------
14:54:20.189 >
14:54:20.189 > Unhandled C++ exception: OOM
14:54:20.193 >
14:54:20.193 > >>>stack>>>
14:54:20.193 >

Decoding stack results
0x40244395: tcp_input at core/tcp_in.c line 501
0x402492f9: ip4_input at core/ipv4/ip4.c line 1467
0x40240e05: mem_malloc at core/mem.c line 210
0x402404c9: ethernet_input_LWIP2 at netif/ethernet.c line 188
0x402402d8: esp2glue_ethernet_input at glue-lwip/lwip-git.c line 118
0x40268e19: ethernet_input at glue-esp/lwip-esp.c line 365
0x40268e2b: ethernet_input at glue-esp/lwip-esp.c line 373
0x4023b4a5: _Balloc at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 128
0x4023b4a5: _Balloc at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 128
0x4023b4a5: _Balloc at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 128
0x4023bcde: __d2b at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 779
0x4023a7d1: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 853
0x4023bcde: __d2b at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 779
0x4023bfb4: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x4023a7d1: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 853
0x4023bfb4: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
Argafal commented 1 year ago

With 0.5.89, in the exact moment of pressing refresh on the website.

16:29:41.183 > I: (#1) Requesting Inv SN 1161XXX4
16:29:41.189 > I: (#1) enqueuedCmd: 0xb
16:29:41.189 > I: (#1) prepareDevInformCmdI: TX 27B Ch40 | 15 74 40 42 54 86 99 51 75 80 0B 00 63 EE 4B E6 00 00 00 01 00 00 00 00 F0 EA BC 
16:29:41.370 > I: RX 27B Ch75 | 95 74 40 42 54 74 40 42 54 02 00 04 6A 04 01 8C 01 8E 01 38 00 0E 00 0E 00 2B ED 
16:29:41.378 > I: RX 27B Ch75 | 95 74 40 42 54 74 40 42 54 03 00 2B 00 04 5C AE 00 04 58 49 01 8C 01 8C 09 49 1E 
16:29:41.386 > I: RX 27B Ch61 | 95 74 40 42 54 74 40 42 54 84 13 86 00 A4 00 DC 00 07 02 56 00 8A 00 01 FA 97 49 
16:29:41.395 > W: Frame 1 missing: Request Retransmit
16:29:41.397 > I: TX 11B Ch61 | 15 74 40 42 54 86 99 51 75 81 8D 
16:29:42.639 > I: RX 27B Ch3 | 95 74 40 42 54 74 40 42 54 01 00 01 01 39 00 0E 00 0E 00 2B 00 2B 00 04 68 8B 4A 
16:29:42.647 > I: procPyld: cmd:  0xb
16:29:42.647 > I: procPyld: txid: 0x95
16:29:42.650 > I: Payload (62): 00 01 01 39 00 0E 00 0E 00 2B 00 2B 00 04 68 8B 00 04 6A 04 01 8C 01 8E 01 38 00 0E 00 0E 00 2B 00 2B 00 04 5C AE 00 04 58 49 01 8C 01 8C 09 49 13 86 00 A4 00 DC 00 07 02 56 00 8A 00 01 
16:29:42.711 > 
16:29:42.711 > User exception (panic/abort/assert)
16:29:42.714 > --------------- CUT HERE FOR EXCEPTION DECODER ---------------
16:29:42.719 > 
16:29:42.719 > Unhandled C++ exception: OOM
16:29:42.722 > 
16:29:42.722 > >>>stack>>>

Decoding stack results
0x402486d3: ip4_input at core/ipv4/ip4.c line 1290
0x40248363: ip4_input at core/ipv4/ip4.c line 1240
0x40243d61: tcp_input at core/tcp_in.c line 501
0x402407d1: mem_malloc at core/mem.c line 210
0x40240830: do_memp_malloc_pool at core/memp.c line 255
0x40248cc5: ip4_input at core/ipv4/ip4.c line 1467
0x402407d1: mem_malloc at core/mem.c line 210
0x4023fe95: ethernet_input_LWIP2 at netif/ethernet.c line 188
0x4023fca4: esp2glue_ethernet_input at glue-lwip/lwip-git.c line 118
0x402687e1: ethernet_input at glue-esp/lwip-esp.c line 365
0x402687f3: ethernet_input at glue-esp/lwip-esp.c line 373
0x40239718: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 352
0x4023b71a: __d2b at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 779
0x4023a20d: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 853
0x4023b71a: __d2b at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 779
0x4023b9f0: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x4023a20d: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 853
0x4023b9f0: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x4023b9f0: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x40236581: __cvt at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_float.c line 102
0x4023b9f0: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x4023b92c: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 182
0x4023be2c: _svfprintf_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 658
0x40236ac5: _printf_float at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_float.c line 330
0x4023f891: glue2esp_linkoutput at glue-esp/lwip-esp.c line 301
0x4023fada: new_linkoutput at glue-lwip/lwip-git.c line 277
0x4023ff22: ethernet_output at netif/ethernet.c line 312
0x402474a0: etharp_output_to_arp_index at core/ipv4/etharp.c line 769
0x40247574: etharp_output_LWIP2 at core/ipv4/etharp.c line 885
0x40248ee4: ip4_output_if_opt_src at core/ipv4/ip4.c line 1764
0x40240830: do_memp_malloc_pool at core/memp.c line 255
0x40248f4c: ip4_output_if_opt at core/ipv4/ip4.c line 1572
lumapu commented 1 year ago

das sind alles libraries außerhalb von Ahoy. Ich kenne das nicht. Da du hier scheinbar alleine mit den Fehler bist, könnte ich mir einen Hardwaredefekt vorstellen. Gibt es eine Art self check, oder könntest du mal etwas anderes drauf spielen?

humus2002 commented 1 year ago

I have reset a WEMOS D1 mini (ESP8266) completey (by installing a blank.bin) and than updated to 0.5.89 although access by 192.168.4.1 is possible, I could not save successfully the configuration data (especially WiFi-settings). WiFi-Scan was not stable although the WiFi is close, some elements on the configuration pages were missing (like the additional DNS settings, PINOUT-settings....) multiple tries to save the WiFi-setting, but the ahoy-dtu does not connect to my WiFi (normal Fritz 7590)

I downgraded now to 0.5.17 and everything is fine again...

so I am convinced that there still is a kind of memory leak bug or similar in the 0.5.89 version..

Argafal commented 1 year ago

I have reset a WEMOS D1 mini (ESP8266) completey (by installing a blank.bin) and than updated to 0.5.89 although access by 192.168.4.1 is possible, I could not save successfully the configuration data (especially WiFi-settings). WiFi-Scan was not stable although the WiFi is close, some elements on the configuration pages were missing (like the additional DNS settings, PINOUT-settings....) multiple tries to save the WiFi-setting, but the ahoy-dtu does not connect to my WiFi (normal Fritz 7590)

I downgraded now to 0.5.17 and everything is fine again...

so I am convinced that there still is a kind of memory leak bug or similar in the 0.5.89 version..

I think this description deserves its own issue. It might be a separate problem, or even a number of problems as many things are mentioned at once. However, I don't see a connection to the OOM I documented with a stack trace above.

@humus2002 Would you please make this a separate new issue? Could you also provide details in that new issue of what you flashed and how you flashed it, and what the exact symptoms of "could not save successfully" were? Let's continue here with the OOM stack trace documented above, okay? Thanks :)

@lumapu Fair enough. Let me dig a little bit more into it and maybe swap out the ESP8266. I just thought it was worth documenting it nevertheless, in case someone else sees the same stack trace on their end.

lumapu commented 1 year ago

@Argafal do you see these OOM exceptions with the latest dev versions, starting from 0.5.93?

Argafal commented 1 year ago

When I opened this issue the OOMs were frequent and came soon after boot up, i.e. they made AHOY hard to use productively.

lumapu commented 1 year ago

I think it works fine now