me-no-dev / ESPAsyncWebServer

Async Web Server for ESP8266 and ESP32
3.68k stars 1.21k forks source link

Too many concurrent connections --> Fatal exception 29(StoreProhibitedCause): #157

Closed TheOriginalMrWolf closed 4 years ago

TheOriginalMrWolf commented 7 years ago

Hi,

First, thank you so much for a wonderful set of libraries!

Unfortunately, having a bit of a problem.

Have made a web server (using master branch of all your libraries downloaded today 11-Apr-17, plus ditto Arduino core) which responds to basic REST requests.

Works perfectly for many hours/days servicing thousands of requests until the server gets too many concurrent open requests. Appears that sometimes the client (running in chrome, firefox, or IE) gets a little excited and sends multiple concurrent requests very quickly to an endpoint (happens if, for example, there's a brief slowdown in the WiFi network, or the browser gets a bit busy for a while, & the requests stack up).

Tried to narrow things down with logging & wireshark, and it seems that if I get something like more than 5 concurrent requests made within approx 1 ms (ie they're all still open and not serviced by the web server within that time) then the ESP crashes with a Fatal exception 29(StoreProhibitedCause): epc1=0x40221dd7, epc2=0x00000000, epc3=0x40000f68, excvaddr=0x00000000, depc=0x00000000.

Sorta looks like a baby syn flood attack...

I've made some test code which is very efficient & fast (& useless because all it does is send back a fixed string), but this still happens.

Do you see this as a bug, or have any suggestions as to how to code around this (eg - just drop syn packets until we have less than 5 concurrent connections???)?

Is there anything I can do to capture more information to help you?

Thanks again!!!!

TheOriginalMrWolf commented 7 years ago

Also, FYI, tried upping the number of allowed simultaneous TCP connections in the hope that a bit more space would help, but this didn't really seem to work. Am I on the wrong track?

extern "C"
{
    #include "espconn.h"
    uint8 espconn_tcp_get_max_con(void);
    sint8 espconn_tcp_set_max_con(uint8 num);
};

...and then in setup():

LOGDEBUG(F("SETUP"), F("Default Max TCP sockets: %u"), espconn_tcp_get_max_con());
LOGDEBUG(F("SETUP"), F("Changing Max TCP Sockets to 10"), espconn_tcp_set_max_con(10));
LOGDEBUG(F("SETUP"), F("Current Max TCP sockets: %u"), espconn_tcp_get_max_con());

The above does, indeed, report the additional sockets, and more heap is used, but still the same crasharoonee... :(

TheOriginalMrWolf commented 7 years ago

2 more interesting points:

Thanks!

me-no-dev commented 7 years ago

oh I so thought I answered you... espconn_X have nothing to do with this library. So whatever you change there will not propagate to this library. We use raw tcp/udp api and espconn is build on top of that. When you get exception, you can decode it using: https://github.com/me-no-dev/EspExceptionDecoder

TheOriginalMrWolf commented 7 years ago

No worries! Yeah... I finally managed to work out that you were using raw api...

So, here's a stack and the dialog (Nice tool!!!!!). Any hints on how to interpret? stacktrace for me-no-dev.txt

Just out of interest, thought I'd play around changing tcp_listen to tcp_listen_with_backlog (variety of numbers for the backlog).

lwipopts.h: #define TCP_LISTEN_BACKLOG              1
ESPAsyncTCP.cpp:
replace:
    tcp_pcb* listen_pcb = tcp_listen(pcb);
with:
    tcp_pcb* listen_pcb = tcp_listen_with_backlog(pcb, 1);

(changed the backlog around from 0 to 10)

Didn't really see much change in behaviour, but surprised that I wasn't seeing RST on SYN when I had backlog of 1 & multiple SYNs outstanding. Wonder what I did wrong, or am not understanding???

Thanks again!!

me-no-dev commented 7 years ago

In order to see changes done to LWIP, you need to select the Core Dev Board from the boards menu and then for LwIP variant -> OpenSource Error looks interesting... could you be running out of heap? I see SSL stuff in the log. Are you using SSL?

TheOriginalMrWolf commented 7 years ago

Not using SSL, and haven't enabled it (as far as I can tell).

ESP runs steady-state serving many queries per second quite happily for hours. Issue seems to be triggered when I start to ramp up the query rate such that we start getting a few queries open in parallel. Once I get something like 3 to 4 parallel tcp sockets serving a simple REST request (possibly coupled with some other concurrent activity), I run out of heap and BOOM.

If I'm only ever servicing one or two queries concurrently then the thing runs perfectly forever.

This is why I'm searching for how to limit the number of concurrent connections.

In theory, this should be achievable by changing tcp_listen to tcp_listen_with_backlog, and enabling TCP_LISTEN_BACKLOG, right? According to LWIP docco, this should send an RST to SYN whenever we go above the backlog level. Setting this to 1 should mean I see quite a few RSTs with Wireshark.... but I don't see any!

I'm trying to troubleshoot by putting some pragmas into lwip's tcp.h & lwipopts.h. These fire properly when I'm compiling.

I am, however, a bit suspicious about tcp.c & tcp_in.c - specifically I don't think tcp_listen_input is actually sending an RST if pcb->accepts_pending >= pcb->backlog (it appears that tcp_listen_input just sorta craps out at that point)... so I made some changes to tcp_in.c to test, but for some reason that's driving me CRAZY, PlatformIO is NOT recompiling this file!!! Aaaaargh...

BTW - I'm using PlatformIO IDE, so I guess adding -DLWIP_OPEN_SRC to build flags in platformio.ini should do it, right?

Thanks!

TheOriginalMrWolf commented 7 years ago

After thinking about it overnight, I really see two problems - first, being able to easily limit the number of concurrent requests being serviced (which I think should ideally be managed in lwip - as it's already got 99% of that in code), and second (bit of a facepalm for me), making sure that I have enough heap to service my REST requests. That part should actually be done in my code...

What I'm doing now as a trial is, for my handler objects, checking free heap before I respond to canHandle. Currently just falling through to a 404 if not enough heap, eg:

bool timerRestRequestHandler::canHandle(AsyncWebServerRequest *request){
  if (request->url().equals(_uri) && (ESP.getFreeHeap() > MIN_FREE_HEAP_TO_SERVICE_REST)) {
    return true;
  }
  return false;
}

May tweak that depending on how the client responds (ie deal with it in the handler itself (though that probably needs more heap), having a generic catch-all handler that sits on the API root, etc).

Either way, already improved stability dramatically.

What do you reckon - something useful to comment on in your user guides?

Also - given that free heap is so critical, would it make sense to build some sort of 'only accept connection if more than x free heap or less than y concurrent connections, otherwise do z' concept into the API?

Cheers!

me-no-dev commented 7 years ago

One thing you can do is to combine as many files as you can (JS+CSS+HTML then minify and gzip), so the browser requests less files when loading. Starting WS/EventSource in the browser after page load is also something that can help. The ESP can not handle more than 4/5 simultaneous connections

TheOriginalMrWolf commented 7 years ago

Yes, good point, however the issue is (currently) not so much web pages as servicing concurrent REST requests.

I feel that ESPAsyncTCP/WebServer has a part to play in helping manage limited resources in a simple way.

Insofar as concurrent connections is concerned, I believe the functionality built into lwip to manage this is broken. I have identified the code in question & written a quick fix to test - however am currently struggling to work out how to get PlatformIO to recompile the library vs using the prebuilt binary (hints welcome!!!).

TheOriginalMrWolf commented 7 years ago

@me-no-dev Ok, quick update:

Made my ESPAsyncXYZ-based server SUPER robust & performant by doing the following (surprisingly simple solution):

And that's it - simples!

Between them, these two simple changes let me hammer the ESP with > 30 requests per second for hours without any issue whatsoever. Requests were a mix of REST requests (mainly managed by the canHandle logic - though sometimes also by the lwip layer), and http page requests (served off SPIFFS) which were nicely throttled by the lwip layer. Previously couldn't do more than about 5-8 without the ESP rebooting!

What do you reckon @me-no-dev - worth considering folding into your lib?

Cheers!

me-no-dev commented 7 years ago

Sure thing! Please make a PR with the changes :)

TheOriginalMrWolf commented 7 years ago

Never done that before, so that'll be fun :)

Just working out now how to 'properly' recompile lwip, modify PIO to link in the new binary etc.

I guess the question for me is that I can make a PR with ESPAsyncTCP changes - but this actually needs lwip to be recompiled with a different switch setting, and then the code to be linked against that, otherwise everything breaks.

I guess I could just document that and manage the rest through #defines...

MoegerleStephan commented 7 years ago

Hi there, thanks for your great work! :+1: I'm sadly having the same issue.. It's quiet hard to recompile the lwip under windows. I don't get the mingw32 to work. Is there any docu or tutorial to manage this? Or would it be possible that you can upload your recompiled lwip? :)

Thanks in advance!

TheOriginalMrWolf commented 7 years ago

any docu or tutorial to manage this? Or would it be possible that you can upload your recompiled lwip

No worries, here it is: liblwip_src.zip

Note, you'll need to invoke tcp_accepted(_pcb) for every connection otherwise the backlog doesn't decrement and you hit the limit (in my case, I customised ESPAsyncTCP as per my comments above).

You might still struggle to convince your dev environment to use it. What IDE etc are you using?

If you want to roll it yourself, the process I used (specific to PlatformIO) was:

Install MinGW

Download and install mingw (http://www.mingw.org/wiki/getting_started) from https://sourceforge.net/projects/mingw/files/latest/download?source=files (click the “download installer” button) Run the installer, and then the installer that the installer installs… Select mingw-developer-toolkit, mingw32-base and mingw32-gcc-g++ for installation, then menu: installation: apply changes. Choose the default installation directory (C:\MinGW) Once installation is finished, change to C:\MinGW\msys\1.0\etc and edit (wordpad) fstab to ensure it contains at least the line: C:/MinGW /mingw Also make sure that there is at least one full blank line at the bottom of the file.

Patch & use

Prerequisites:

  1. git
  2. Python 2.7

Initial setup:

  1. Start a command prompt
  2. Change to C:\Users\yourname.platformio\packages\framework-arduinoespressif8266\tools>
  3. Run > python get.py
  4. Once all tools are installed, exit command prompt

Modify lwipopts.h

  1. Open the file C:\Users\yourname.platformio\packages\framework-arduinoespressif8266\tools\sdk\lwip\include\lwipopts.h
  2. Enable the backlog option for tcp listen pcb – find and change

    define TCP_LISTEN_BACKLOG 0

    To

    define TCP_LISTEN_BACKLOG 1

Recompile lwip & copy result to library directory

  1. Start C:\MinGW\msys\1.0\msys.bat from Windows Explorer
  2. cd /c/Users/yourname/.platformio/packages/framework-arduinoespressif8266/tools/sdk/lwip/src
  3. run make clean
  4. run make install
  5. This will recompile the lwip library, taking in the new TCP_LISTEN_BACKLOG directive, write the binary object file, and copy the file to /c/Users/yourname/.platformio/packages/framework-arduinoespressif8266/tools/sdk/lib/liblwip_src.a (the file normally linked is liblwip_gcc.a)
  6. Check to ensure the file was successfully written

Configure PlatformIO to use the new binary

  1. Edit the file C:\Users\yourname\.platformio\platforms\espressif8266_stage\builder\frameworks\arduino.py

  2. In the LIBS=[ section, remove the lwip_gcc entry. This should change "hal", "phy", "pp", "net80211", "lwip_gcc", "wpa", "crypto", To "hal", "phy", "pp", "net80211", "wpa", "crypto",

  3. In platformio.ini, add -llwip_src to build flags. Eg: a. build_flags = -DLWIP_OPEN_SRC -Wl,-Tesp8266.flash.4m.ld -llwip_src NOTE: you can now switch back to the original binary library by simply changing the build_flag entry to -llwip_gcc.

Hope this helps!!

MoegerleStephan commented 7 years ago

thanks for your great support! In the meantime I've set linux up and used the "Core Development Module" in Arduino. After I've fixed the new problems caused by the new compiler, I was able to customize the lwip core and recompile it correctly.

After all I've wrote my own SYNC flooding software (without sending RST back) and attacked the esp with over 100k requets with 20 requests per second without any crash. Without this patch it crashes directly after some SYNC packets..

@TheOriginalMrWolf Thanks for the great work! That helped me a lot for creating a tough server. :+1:

And of course thanks for the great support of the AsyncWebServer as well as the AsyncTCP! :+1:

me-no-dev commented 7 years ago

We might need to raise this question to the arduino repo as well. That could maybe improve things on all clients, but will need testing.

TheOriginalMrWolf commented 7 years ago

SYN flooding software ... attacked the esp with over 100k requets with 20 requests per second without any crash. Without this patch it crashes directly after some SYNC packets

Awesome! Truly "industrial strength" :)

The more I use and learn about this fabulous platform and set of libraries, the more impressed I am!

Pajn commented 7 years ago

I have a similar problem if I refresh quickly but the stacktrace differs a bit. Is it the same error or something else?

Exception 29: StoreProhibited: A store referenced a page mapped with an attribute that does not permit stores
Decoding 25 results
0x402102d0: std::function ::function(std::function  const&) at /home/rasmus/.arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/xtensa-lx106-elf/include/c++/4.8.2/functional line 2439
0x4020bef8: AsyncWebServerRequest::arg(String const&) const at /home/rasmus/Arduino/libraries/ESPAsyncWebServer/src/WebRequest.cpp line 740
0x4020eace: String::concat(char const*, unsigned int) at /home/rasmus/bin/arduino-1.8.3/hardware/esp8266com/esp8266/cores/esp8266/WString.cpp line 519
0x4020bf1a: AsyncWebServerRequest::arg(String const&) const at /home/rasmus/Arduino/libraries/ESPAsyncWebServer/src/WebRequest.cpp line 740
0x40206e3f: _M_manager at /home/rasmus/Arduino/libraries/FastLED/controller.h line 170
0x401006ec: cont_run at /home/rasmus/bin/arduino-1.8.3/hardware/esp8266com/esp8266/cores/esp8266/cont.S line 79
0x40230000: wpa_auth_sm_event at ?? line ?
0x402326b1: eapol_txcb at ?? line ?
0x40231aad: wpa_auth_uses_mfp at ?? line ?
0x4020f135: ~DirImpl at /home/rasmus/bin/arduino-1.8.3/hardware/esp8266com/esp8266/cores/esp8266/FSImpl.h line 56
:  (inlined by) fs::DirImpl::~DirImpl() at /home/rasmus/bin/arduino-1.8.3/hardware/esp8266com/esp8266/cores/esp8266/FSImpl.h line 56
0x401057ea: __divsf3 at /home/wjg/Repo/esp-open-sdk/crosstool-NG/.build/src/gcc-4.8.2/libgcc/config/xtensa/ieee754-sf.S line 1006
0x402124f7: _strtod_r at /Users/igrokhotkov/e/newlib-xtensa/xtensa-lx106-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/strtod.c line 801
0x402015bc: __pinMode at /home/rasmus/bin/arduino-1.8.3/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_wiring_digital.c line 51
0x402268f4: ieee80211_ht_updateparams at ?? line ?
0x4022691a: ieee80211_ht_updateparams at ?? line ?
0x4020f0cf: user_init at /home/rasmus/bin/arduino-1.8.3/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 57
universam1 commented 7 years ago

Any chance to get this into arduino core?

stale[bot] commented 4 years ago

[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

[STALE_DEL] This stale issue has been automatically closed. Thank you for your contributions.

gnalbandian commented 4 years ago

Sorry to reopen, but, has this been merged? I think this fix would also help when having multiple scripts/css in the main html. I have had issues that only could be solved by merging all JavaScript a files in only one. The same with css. Some others had solved this issues by lazy loading jS/css. I believe this fix would avoid all those unpractical workarounds