warmcat / lws-esp32-factory

Libwebsockets ESP32 Factory Application
Other
80 stars 24 forks source link

Illegal back-to-back write error #26

Closed FredrikFornstad closed 6 years ago

FredrikFornstad commented 6 years ago

I am not sure what happened here, but I was using the test-demo with one Win10/Edge-browser connected with a websocket open to do the "shared drawing". Only one computer connected. I did a "stress test" = intesive drawing with the mouse, and got this error after about 20 seconds of drawing:

?[35;1m[2018/02/02 17:44:37:0035] NOTICE: heap :165664 (-1472) ?[31;1m[2018/02/02 17:44:38:3798] ERR: ?[31;1m[2018/02/02 17:44:38:3810] ERR: 0000: 81 1A 45 23 64 20 23 30 30 30 30 66 66 20 33 33 ..E#d #0000ff 33 ?[31;1m[2018/02/02 17:44:38:3838] ERR: 0010: 388 20 39 32 20 33 34 33 20 38 38 3B 8 92 343 88; ?[31;1m[2018/02/02 17:44:38:3939] ERR: ?[31;1m[2018/02/02 17:44:38:3979] ERR: ** 0x3ffd5bc4: vh: station, prot: lws-meta, Illegal back-to-back write of 28 detec ted... ?[35;1m[20018/02/02 17:44:39:0251] NOTICE: heap :186156 (+20492)

I have noticed that the esp32 runs pretty hot and that the watchdog triggers (complaining that main on CPU 0 have not reset the watchdog in time) frequently when a browser connects, so the CPU is probably fairly busy...

lws-team commented 6 years ago

The test demo is a bit of a worst-case for ESP32, with two windows open it can want up to 8 simultaneous TLS connections open.

mbedtls wants 32KBytes buffer space per connection, there simply isn't enough memory in the chip to do it. So until now it uses a trick multiplexing ws protocol to bundle three of the connections in one.

Over Christmas I completed patches on mbedtls that change its buffer allocation to be done dynamically. This reduces the memory cost of a TLS connection down to a few KB. It doesn't change the cost in time of opening each tls tunnel, which is about 1s each.

I have changed both -factory and -test-server-demos to use this patched mbedtls, and removed lws_meta in the demo case so it makes individual connections, and you can see for yourself the effect on free memory.

As a side effect, you shouldn't see this problem any more, please give it a go. Notice again you will need to bring in the new mbedtls submodule with git submodule update --init --recursive or so.

FredrikFornstad commented 6 years ago

Tested now with the new version of lws (Feb 19th) including submodule update.

I had one Edge-browser connected for maybe 5 minutes with the counter counting, then changed to the mirror-demo and did some drawing. After a short while I connected an Android phone (Chrome browser), after a short while I changed to mirror-demo and did some more drawing from the Edge browser and just watching on the Android. Then on the Android I changed to Close testing and pressed "Open" . That resulted in a back-to-back write error, see below:

?[35;1m[20018/02/19 17:26:44:0268] NOTICE: heap :125764 (-4) ?[35;1m[2018/02/19 17:26:45:0367] NOTICE: heap :127324 (+1560) ?[35;1m[2018/02/19 17:26:46:0055] NOTICE: heap :125768 (-1556) ?[35;1m[2018/02/19 17:26:47:01171] NOTICE: heap :127324 (+1556) ?[35;1m[2018/02/19 17:26:49:0167] NNOTICE: heap :125768 (-1556) ?[35;1m[2018/02/19 17:26:50:0367] NOTICE: heap :127324 (+1556) ?[35;1m[2018/02/19 17:26:51:3292] NNOTICE: _realloc: size 552: new server wsi (free heap 124680) ?[35;1m[2018/02/19 17:26:51:3331] NOTICE: ssl_pm_handshake ?[35;1m[2018/02/19 17:26:51:3338] NOTICE: mbedtls_handshake: ssl ret -0 state 1 ?[35;1m[2018/02/19 17:26:51:3440] NOTICE: mbedtls_handshake: ssl ret -0 state 2 ?[35;1m[2018/02/19 17:26:51:3531] NOTICE: mbedtls_handshake: ssl ret -0 state 33 ?[35;1m[2018/02/19 17:26:51:3613] NOTICE: mbedtls_handshake: ssl ret -0 state 4 ?[35;1m[2018/02/19 117:26:52:4273] NOTICE: mbedtls_handshake: ssl ret -0 state 5 ?[35;1m[2018/02/19 17:26:52:4293] NOTICE: mbedtls_handshake: ssl ret -0 state 6 ?[35;1m[2018/02/119 17:26:52:4361] NOTICE: mbedtls_handshake: ssl ret -0 state 7 ?[35;1m[2018/02/19 17:26:52:4401] NOTICE: mbedtls_handshake: ssl ret -0 state 8 ?[35;1m[2018/02/19 17:26:52:4480] NOTICE: mbedtls_handshake: ssl ret -69000 state 8 ?[35;1m[2018/02/19 17:26:52:4553] NOTICE: _realloc: size 996: ah struct (free heap 109524) ?[35;1m[2018/02/19 17:26:52:4634] NOTICE: _realloc: size 900: ah data (free heap 108620) ?[35;1m[2018/02/19 17:26:52:4721] NOTICE: heap :108616 (-18708) ?[35;1m[2018/02/19 17:26:52:5345] NOTICE: ssl_pm_handsshake ?[35;1m[2018/02/19 17:26:52:95561] NOTICE: mbedtls_handshake: ssl ret -0 state 9 ?[35;1m[2018/02/19 17:26:52:9568] NOTICE: mbedtls_handshake: ssl ret -0 state 10 ?[35;1m[2018/02/19 17:26:52:9611] NOTICE: mbedtls_handshake: ssl ret -0 state 11 ?[35;1m[2018/02/19 17:26:52:9704] NOTICE: mbedtls_handdshake: ssl ret -0 state 12 ?[35;1m[2018/02/19 17:26:52:9793] NOTICE: mbedtls_handshake: ssl ret -0 state 13 ?[35;1m[2018/02/19 17:26:52:9884] NOTICE: mbedtls_handshake: ssl ret -0 state 14 ?[35;1m[2018/02/19 17:26:52:9903] NOTICE: mbedtls_handshake: ssl ret -0 state 15 ?[35;1m[2018/02/19 17:26:52:9985] NOTICE: mbedtls_handshake: ssl ret -0 state 16 ?[35;1m[2018/02/19 17:26:53:0058] NOTICE: heap :114708 (+6092) ?[31;1m[2018/02/19 17:26:53:0223] ERR: ?[31;1m[2018/02/19 17:26:53:0232] ERR: 0000: 81 04 32 31 30 332 ..2102

?[31;1m[2018/02/19 17:26:53:0276] ERR: ?[31;1m[2018/02/19 17:26:53:0319] ERR: ** 0x3ffd6ab8: vh: station, prot: dumb-increment-protocol, Illegal back-to -back write of 6 detected... ?[31;1m[2018/02/19 17:26:53:0443] ERR: ERROR 4 writing to di socket ?[35;1m[2018/02/19 17:26:53:0506] NOTICE: lws_service_fd: closing ?[35;1m[2018/02/19 17:26:53:0646] NOTICE: _realloc: size 552: new server wsi (freee heap 121296) ?[35;1m[2018/02/19 17:26:53:0697] NOTICE: ssl_pm_handshake ?[35;1m[2018/02/19 17:26:53:0712] NOTICE: mbedtls_handshake: ssl ret -0 state 1 ?[35;1m[2018/02/19 17:26:53:0818] NOTICE: mbedtls_handshake: ssl ret -0 state 2 ?[35;1m[2018/02/19 17:26:53:0913] NOTICE: mbedtls_handshake: ssl ret -0 state 3 ?[35;1m[2018/02/19 17:26:53:0986] NOTICE: mbedtls_handshake: ssl ret -0 state 4 ?[35;1m[2018/02/19 17:26:54:1881] NOTICE: mbedtls_handshake: ssl ret -0 state 5 ?[35;1m[2018/02/199 17:26:54:1888] NOTICE: mbedtls_handshake: ssl ret -0 state 6 ?[35;1m[2018/02/19 17:26:54:1953] NOTICE: mbedtls_handshake: ssl ret -0 state 7 ?[35;1m[2018/02/19 17:26:54::1998] NOTICE: mbedtls_handshake: ssl ret -0 state 8 ?[35;1m[2018/02/19 17:26:54:2077] NOTICE: mbedtls_handshake: ssl ret -6900 state 8 ?[35;1m[2018/02/19 17:26:54:2149] NOOTICE: _realloc: size 996: ah struct (free heap 111248) ?[35;1m[2018/02/19 17:26:54:2231] NOTICE: _realloc: size 900: ah data (free heap 110344) ?[35;1m[2018/02/19 17:26:544:2472] NOTICE: heap :118016 (+3308) ?[35;1m[2018/02/19 17:26:54:2485] NOTICE: ssl_pm_handshake ?[35;1m[2018/02/19 17:26:54:6951] NOTICE: mbedtls_handshake: ssl reet -0 state 9 ?[35;1m[2018/02/19 17:26:54:6958] NOTICE: mbedtls_handshake: ssl ret -0 state 10 ?[35;1m[2018/02/19 17:26:54:7002] NOTICE: mbedtls_handshake: ssl ret -0 state 11 ?[35;1m[2018/02/19 17:26::54:7094] NOTICE: mbedtls_handshake: ssl ret -0 state 12 ?[35;1m[2018/02/19 17:26:54:7184] NOTICE: mbedtls_handshake: ssl ret -0 state 13 ?[35;1m[2018/02/19 17:26:54:7269] NOTICE: mbedtls_handshake: ssl ret -0 state 14 ?[35;1m[2018/02/19 17:26:54:7300] NOTICE: mbedtls_handshake: ssl ret -0 state 15 ?[35;1m[2018/02/19 17:26:54:7378] NOTICE: mbedtls_handshake: ssl ret -0 statee 16 ?[35;1m[2018/02/19 17:26:54:7642] NOTICE: _realloc: size 192: ws struct (free heap 125216) ?[35;1m[2018/02/19 17:26:54:7649] NOTICE: _realloc: size 4: user space (free heap 125208) ?[35;1m[2018/02/19 17:26:54:7751] NOTICE: _realloc: size 148: truncated send alloc (free heap 123500) ?[35;1m[2018/02/19 17:26:54:7798] NOTICE: _realloc: size 30: rx_ubuf (free heap 123470) ?[35;1m[2018/02/19 17:26:55:0366] NOTICE: heap :126936 (+8920) ?[35;1m[2018/02/19 17:26:57:00023] NOTICE: heap :123392 (-3544) ?[35;1m[2018/02/19 17:26:58:0267] NOTICE: heaap :126936 (+3544) ?[35;1m[2018/02/19 17:26:59:0052] NOTICE: heap :125168 (-1768) ?[35;1m[2018/02/19 17:27:00:0168] NNOTICE: heap :126936 (+1768) ?[35;1m[2018/02/19 17:27:01:0051] NOTICE: heap :125168 (-1768) ?[35;1m[2018/02/19 17:27:02:0168] NOTTICE: heap :126936 (+1768)

lws-team commented 6 years ago

I found the root cause for this... the timers in ESP32 run in another thread. There is double-checking in lws for async threads calling lws_callback_on_writable(), but the action it took when it detected it led to the problem. This is fixed in lws now and both -factory and -test-server-demos updated.

FredrikFornstad commented 6 years ago

I have not seen this anymore, so the fix seems to be ok.