Closed TheOriginalMrWolf closed 4 years ago
Also, FYI, tried upping the number of allowed simultaneous TCP connections in the hope that a bit more space would help, but this didn't really seem to work. Am I on the wrong track?
extern "C"
{
#include "espconn.h"
uint8 espconn_tcp_get_max_con(void);
sint8 espconn_tcp_set_max_con(uint8 num);
};
...and then in setup():
LOGDEBUG(F("SETUP"), F("Default Max TCP sockets: %u"), espconn_tcp_get_max_con());
LOGDEBUG(F("SETUP"), F("Changing Max TCP Sockets to 10"), espconn_tcp_set_max_con(10));
LOGDEBUG(F("SETUP"), F("Current Max TCP sockets: %u"), espconn_tcp_get_max_con());
The above does, indeed, report the additional sockets, and more heap is used, but still the same crasharoonee... :(
2 more interesting points:
Thanks!
oh I so thought I answered you... espconn_X have nothing to do with this library. So whatever you change there will not propagate to this library. We use raw tcp/udp api and espconn is build on top of that. When you get exception, you can decode it using: https://github.com/me-no-dev/EspExceptionDecoder
No worries! Yeah... I finally managed to work out that you were using raw api...
So, here's a stack and the dialog (Nice tool!!!!!). Any hints on how to interpret? stacktrace for me-no-dev.txt
Just out of interest, thought I'd play around changing tcp_listen to tcp_listen_with_backlog (variety of numbers for the backlog).
lwipopts.h: #define TCP_LISTEN_BACKLOG 1
ESPAsyncTCP.cpp:
replace:
tcp_pcb* listen_pcb = tcp_listen(pcb);
with:
tcp_pcb* listen_pcb = tcp_listen_with_backlog(pcb, 1);
(changed the backlog around from 0 to 10)
Didn't really see much change in behaviour, but surprised that I wasn't seeing RST on SYN when I had backlog of 1 & multiple SYNs outstanding. Wonder what I did wrong, or am not understanding???
Thanks again!!
In order to see changes done to LWIP, you need to select the Core Dev Board from the boards menu and then for LwIP variant -> OpenSource Error looks interesting... could you be running out of heap? I see SSL stuff in the log. Are you using SSL?
Not using SSL, and haven't enabled it (as far as I can tell).
ESP runs steady-state serving many queries per second quite happily for hours. Issue seems to be triggered when I start to ramp up the query rate such that we start getting a few queries open in parallel. Once I get something like 3 to 4 parallel tcp sockets serving a simple REST request (possibly coupled with some other concurrent activity), I run out of heap and BOOM.
If I'm only ever servicing one or two queries concurrently then the thing runs perfectly forever.
This is why I'm searching for how to limit the number of concurrent connections.
In theory, this should be achievable by changing tcp_listen to tcp_listen_with_backlog, and enabling TCP_LISTEN_BACKLOG, right? According to LWIP docco, this should send an RST to SYN whenever we go above the backlog level. Setting this to 1 should mean I see quite a few RSTs with Wireshark.... but I don't see any!
I'm trying to troubleshoot by putting some pragmas into lwip's tcp.h & lwipopts.h. These fire properly when I'm compiling.
I am, however, a bit suspicious about tcp.c & tcp_in.c - specifically I don't think tcp_listen_input is actually sending an RST if pcb->accepts_pending >= pcb->backlog (it appears that tcp_listen_input just sorta craps out at that point)... so I made some changes to tcp_in.c to test, but for some reason that's driving me CRAZY, PlatformIO is NOT recompiling this file!!! Aaaaargh...
BTW - I'm using PlatformIO IDE, so I guess adding -DLWIP_OPEN_SRC to build flags in platformio.ini should do it, right?
Thanks!
After thinking about it overnight, I really see two problems - first, being able to easily limit the number of concurrent requests being serviced (which I think should ideally be managed in lwip - as it's already got 99% of that in code), and second (bit of a facepalm for me), making sure that I have enough heap to service my REST requests. That part should actually be done in my code...
What I'm doing now as a trial is, for my handler objects, checking free heap before I respond to canHandle. Currently just falling through to a 404 if not enough heap, eg:
bool timerRestRequestHandler::canHandle(AsyncWebServerRequest *request){
if (request->url().equals(_uri) && (ESP.getFreeHeap() > MIN_FREE_HEAP_TO_SERVICE_REST)) {
return true;
}
return false;
}
May tweak that depending on how the client responds (ie deal with it in the handler itself (though that probably needs more heap), having a generic catch-all handler that sits on the API root, etc).
Either way, already improved stability dramatically.
What do you reckon - something useful to comment on in your user guides?
Also - given that free heap is so critical, would it make sense to build some sort of 'only accept connection if more than x free heap or less than y concurrent connections, otherwise do z' concept into the API?
Cheers!
One thing you can do is to combine as many files as you can (JS+CSS+HTML then minify and gzip), so the browser requests less files when loading. Starting WS/EventSource in the browser after page load is also something that can help. The ESP can not handle more than 4/5 simultaneous connections
Yes, good point, however the issue is (currently) not so much web pages as servicing concurrent REST requests.
I feel that ESPAsyncTCP/WebServer has a part to play in helping manage limited resources in a simple way.
Insofar as concurrent connections is concerned, I believe the functionality built into lwip to manage this is broken. I have identified the code in question & written a quick fix to test - however am currently struggling to work out how to get PlatformIO to recompile the library vs using the prebuilt binary (hints welcome!!!).
@me-no-dev Ok, quick update:
Made my ESPAsyncXYZ-based server SUPER robust & performant by doing the following (surprisingly simple solution):
And that's it - simples!
Between them, these two simple changes let me hammer the ESP with > 30 requests per second for hours without any issue whatsoever. Requests were a mix of REST requests (mainly managed by the canHandle logic - though sometimes also by the lwip layer), and http page requests (served off SPIFFS) which were nicely throttled by the lwip layer. Previously couldn't do more than about 5-8 without the ESP rebooting!
What do you reckon @me-no-dev - worth considering folding into your lib?
Cheers!
Sure thing! Please make a PR with the changes :)
Never done that before, so that'll be fun :)
Just working out now how to 'properly' recompile lwip, modify PIO to link in the new binary etc.
I guess the question for me is that I can make a PR with ESPAsyncTCP changes - but this actually needs lwip to be recompiled with a different switch setting, and then the code to be linked against that, otherwise everything breaks.
I guess I could just document that and manage the rest through #defines...
Hi there, thanks for your great work! :+1: I'm sadly having the same issue.. It's quiet hard to recompile the lwip under windows. I don't get the mingw32 to work. Is there any docu or tutorial to manage this? Or would it be possible that you can upload your recompiled lwip? :)
Thanks in advance!
any docu or tutorial to manage this? Or would it be possible that you can upload your recompiled lwip
No worries, here it is: liblwip_src.zip
Note, you'll need to invoke tcp_accepted(_pcb) for every connection otherwise the backlog doesn't decrement and you hit the limit (in my case, I customised ESPAsyncTCP as per my comments above).
You might still struggle to convince your dev environment to use it. What IDE etc are you using?
If you want to roll it yourself, the process I used (specific to PlatformIO) was:
Download and install mingw (http://www.mingw.org/wiki/getting_started) from https://sourceforge.net/projects/mingw/files/latest/download?source=files (click the “download installer” button) Run the installer, and then the installer that the installer installs… Select mingw-developer-toolkit, mingw32-base and mingw32-gcc-g++ for installation, then menu: installation: apply changes. Choose the default installation directory (C:\MinGW) Once installation is finished, change to C:\MinGW\msys\1.0\etc and edit (wordpad) fstab to ensure it contains at least the line: C:/MinGW /mingw Also make sure that there is at least one full blank line at the bottom of the file.
To
Edit the file C:\Users\yourname\.platformio\platforms\espressif8266_stage\builder\frameworks\arduino.py
In the LIBS=[ section, remove the lwip_gcc entry. This should change "hal", "phy", "pp", "net80211", "lwip_gcc", "wpa", "crypto", To "hal", "phy", "pp", "net80211", "wpa", "crypto",
In platformio.ini, add -llwip_src to build flags. Eg: a. build_flags = -DLWIP_OPEN_SRC -Wl,-Tesp8266.flash.4m.ld -llwip_src NOTE: you can now switch back to the original binary library by simply changing the build_flag entry to -llwip_gcc.
Hope this helps!!
thanks for your great support! In the meantime I've set linux up and used the "Core Development Module" in Arduino. After I've fixed the new problems caused by the new compiler, I was able to customize the lwip core and recompile it correctly.
After all I've wrote my own SYNC flooding software (without sending RST back) and attacked the esp with over 100k requets with 20 requests per second without any crash. Without this patch it crashes directly after some SYNC packets..
@TheOriginalMrWolf Thanks for the great work! That helped me a lot for creating a tough server. :+1:
And of course thanks for the great support of the AsyncWebServer as well as the AsyncTCP! :+1:
We might need to raise this question to the arduino repo as well. That could maybe improve things on all clients, but will need testing.
SYN flooding software ... attacked the esp with over 100k requets with 20 requests per second without any crash. Without this patch it crashes directly after some SYNC packets
Awesome! Truly "industrial strength" :)
The more I use and learn about this fabulous platform and set of libraries, the more impressed I am!
I have a similar problem if I refresh quickly but the stacktrace differs a bit. Is it the same error or something else?
Exception 29: StoreProhibited: A store referenced a page mapped with an attribute that does not permit stores
Decoding 25 results
0x402102d0: std::function ::function(std::function const&) at /home/rasmus/.arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/xtensa-lx106-elf/include/c++/4.8.2/functional line 2439
0x4020bef8: AsyncWebServerRequest::arg(String const&) const at /home/rasmus/Arduino/libraries/ESPAsyncWebServer/src/WebRequest.cpp line 740
0x4020eace: String::concat(char const*, unsigned int) at /home/rasmus/bin/arduino-1.8.3/hardware/esp8266com/esp8266/cores/esp8266/WString.cpp line 519
0x4020bf1a: AsyncWebServerRequest::arg(String const&) const at /home/rasmus/Arduino/libraries/ESPAsyncWebServer/src/WebRequest.cpp line 740
0x40206e3f: _M_manager at /home/rasmus/Arduino/libraries/FastLED/controller.h line 170
0x401006ec: cont_run at /home/rasmus/bin/arduino-1.8.3/hardware/esp8266com/esp8266/cores/esp8266/cont.S line 79
0x40230000: wpa_auth_sm_event at ?? line ?
0x402326b1: eapol_txcb at ?? line ?
0x40231aad: wpa_auth_uses_mfp at ?? line ?
0x4020f135: ~DirImpl at /home/rasmus/bin/arduino-1.8.3/hardware/esp8266com/esp8266/cores/esp8266/FSImpl.h line 56
: (inlined by) fs::DirImpl::~DirImpl() at /home/rasmus/bin/arduino-1.8.3/hardware/esp8266com/esp8266/cores/esp8266/FSImpl.h line 56
0x401057ea: __divsf3 at /home/wjg/Repo/esp-open-sdk/crosstool-NG/.build/src/gcc-4.8.2/libgcc/config/xtensa/ieee754-sf.S line 1006
0x402124f7: _strtod_r at /Users/igrokhotkov/e/newlib-xtensa/xtensa-lx106-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/strtod.c line 801
0x402015bc: __pinMode at /home/rasmus/bin/arduino-1.8.3/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_wiring_digital.c line 51
0x402268f4: ieee80211_ht_updateparams at ?? line ?
0x4022691a: ieee80211_ht_updateparams at ?? line ?
0x4020f0cf: user_init at /home/rasmus/bin/arduino-1.8.3/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 57
Any chance to get this into arduino core?
[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
[STALE_DEL] This stale issue has been automatically closed. Thank you for your contributions.
Sorry to reopen, but, has this been merged? I think this fix would also help when having multiple scripts/css in the main html. I have had issues that only could be solved by merging all JavaScript a files in only one. The same with css. Some others had solved this issues by lazy loading jS/css. I believe this fix would avoid all those unpractical workarounds
Hi,
First, thank you so much for a wonderful set of libraries!
Unfortunately, having a bit of a problem.
Have made a web server (using master branch of all your libraries downloaded today 11-Apr-17, plus ditto Arduino core) which responds to basic REST requests.
Works perfectly for many hours/days servicing thousands of requests until the server gets too many concurrent open requests. Appears that sometimes the client (running in chrome, firefox, or IE) gets a little excited and sends multiple concurrent requests very quickly to an endpoint (happens if, for example, there's a brief slowdown in the WiFi network, or the browser gets a bit busy for a while, & the requests stack up).
Tried to narrow things down with logging & wireshark, and it seems that if I get something like more than 5 concurrent requests made within approx 1 ms (ie they're all still open and not serviced by the web server within that time) then the ESP crashes with a Fatal exception 29(StoreProhibitedCause): epc1=0x40221dd7, epc2=0x00000000, epc3=0x40000f68, excvaddr=0x00000000, depc=0x00000000.
Sorta looks like a baby syn flood attack...
I've made some test code which is very efficient & fast (& useless because all it does is send back a fixed string), but this still happens.
Do you see this as a bug, or have any suggestions as to how to code around this (eg - just drop syn packets until we have less than 5 concurrent connections???)?
Is there anything I can do to capture more information to help you?
Thanks again!!!!