Open lumapu opened 2 months ago
this is probably due to mem shortage and not a bug in server, advise to monitor your heap size and fragmentation. Or better switch to ESP32.
@lumapu : I know the project (using OpenDTU myself). The ESP8266 stack traces are quite ugly compared to the ones platformio produced for rsp32.
Questions :
1) did you measure the heap usage (and free heep) in some of the functions listed above before they allocate ? Like @vortigont said th second trace feels like failure to allocate
2) do you have this bug with only this fork, or also with the original one and the one from younodebox (see readme) ?
@lumapu correct me if I am wrong but I dont see any usage in your project Pio file of this fork, neither in your dev branch... Are you opening this issue in the right location? ;-) Ref: https://github.com/lumapu/ahoy/blob/main/src/platformio.ini
did you measure the heap usage (and free heep) in some of the functions listed above before they allocate ? Like @vortigont said th second trace feels like failure to allocate
no, not directly at this end, but I have some function to read it during operation via API.
For ESP8266 there is the field max_free_block
which reads for your fork 9136 bytes and for the esphome 9672 bytes - both after a few clicks in the WebUI. The free heap is in the same region 9600 and 9800.
do you have this bug with only this fork, or also with the original one and the one from younodebox (see readme) ?
checked again the esphome fork. No crash was produceable. Then seconds after a new compiled version with your fork which crashes really fast.
correct me if I am wrong but I dont see any usage in your project Pio file of this fork, neither in your dev branch... Are you opening this issue in the right location? ;-)
Good point. I need to do some basic test before delivering new software to the folks. It was not published using your fork - I have only a feature branch localy. Yesterday I found your fork and wanted to test it immediately. It works like a charm on ESP32 but on ESP8266 I see some problems.
I really appreciate that you want to maintain the AsyncWebserver and completly read your discussion with @egnor. Some month ago I was searching for a better maintained fork coming from the younodebox one and found esphome. Since yesterday I know yours and hope that I can use it in near future.
@lumapu thank you for these details!
Can you please confirm: you are then using esphome/ESPAsyncTCP-esphome @ 2.0.0
in both your tests and you are then just swapping esphome/ESPAsyncWebServer
with mathieucarbou/ESPAsyncWebServer
, right ?
The Async lib behind stays the same, but just the ESPAsyncWebServer
changes ?
Also, if I look at the traces, you are using SSE, not WebSocket, right ?
I suspect that the difference in heap usage is due to the recent change from @vortigont: the project included a custom-made implementation of a forward linked list, which was replaced with std::list which is bi-directional and allows for constant time additions and removal. So the little heap usage increase is expected.
We are both not using ESP8266 on a daily basis so it would help a lot if you had the opportunity to create a minimal reproductible test case in an .ino file we could add to the project.
If the Async lib stays the same, but only the ESPAsyncWebServer fork is swapped, it would be interesting to find the issue indeed.
The only big changes from ESPHome fork regarding SSE are in commits bb4eb89c8e028005ef84f875417d32ca095147e7 and 48968b5be5ffc7dd0b763752e8e7255fbc6c2871 for SSE (@vortigont fyi) - not considering the more common api (request / response / handlers)
9k of heap is definitely too low to work reliably even with a single connection. Running on the edge its is just a matter of time when you'll hit the out of mem issue. If your project if so memory stressed then I would not target for 8266 at all. Sorry, but this chip is too old to invest considerable time to optimize the code for such a limited conditions. This is just my opinion.
@lumapu : you would need to measure the free heap just before the allocation requests that are failing (line see see in the stack traces ).
@vortigont : what I do not get is why it works with the ESPhome fork. I agree with you that the free memory is too low and this is asking for problems, but the difference between the 2 forks in terms of memory usage is low.
I was wondering if one of the 2 commits could have introduced a side effect not thought of. Sincerely I do not see any right now that is why I was asking for your second option.
but this chip is too old to invest considerable time to optimize the code for such a limited conditions. This is just my opinion.
I agree, and I had to make the same hard choice on my projects. The ESP8266 is 10 years old now and you can easily swap it with an ESP32 for less then 2 euros which has a lot more memory, power, cores etc
what I do not get is why it works with the ESPhome fork
that's what I mean - there might be something very specific indeed that could be investigated and even probably fixed or optimized, but to do this on 8266 - nah... have more things to invest time and efforts into :)
As I see from traces it fails on malloc
or new
and vfprintf
around, so the most probable cause is mem constrains indeed, either for heap of for stack. I do not have working SSE example to test on, never used it actually, mostly done the changes heuristically. I can try dig into this a bit, but if some minimal reproducible example code provided.
I can try dig into this a bit, but if some minimal reproducible example code provided.
I agree: without more effort from @lumapu to pinpoint a bit more the issue and have a minimal reproductible use case proving any issue from the library, we cannot do anything but suspect a memory constrain as shown in the stack trace.
@lumapu : you should monitor your free heap at key points where memory is allocated (before these malloc / new / vfprintf calls. CONFIG_ASYNC_TCP_STACK_SIZE
is for AsyncTCP, which you are not using since you are on ESP8266. There is no task and stack size to configure.
@lumapu : could you please walk me through your project and tell me exactly which API ou are calling, with which kind of data when it fails ?
I guess it all start here, but please be more specific. https://github.com/lumapu/ahoy/blob/main/src/web/web.h
I am willing to help more, but the lack of information you give is not helping ;-)
Specifically, what I am searching for, is if a change in method signature regarding PROGMEM usage could have made it so that the content is now not read from flash but loaded into ram.
So I need to know what exactly you are using for the ESpAsync API.
As I understand right now, your html pages are generated with a python script and their type is const uint8_t {}[] PROGMEM
right ?
And you are using beginResponse_P
to serve them ?
So the method which is called is:
AsyncWebServerResponse *beginResponse_P(int code, const String& contentType, const uint8_t * content, size_t len, AwsTemplateProcessor callback=nullptr);`
which is implemented in ESPHome fork and original repo as:
AsyncWebServerResponse * AsyncWebServerRequest::beginResponse_P(int code, const String& contentType, const uint8_t * content, size_t len, AwsTemplateProcessor callback){
return new AsyncProgmemResponse(code, contentType, content, len, callback);
}
In our repo, this method is deprecated and redirected:
[[deprecated("Replaced by beginResponse(...)")]]
AsyncWebServerResponse* beginResponse_P(int code, const String& contentType, const uint8_t* content, size_t len, AwsTemplateProcessor callback = nullptr) {
return beginResponse(code, contentType, content, len, callback);
}
and goes to:
AsyncWebServerResponse* AsyncWebServerRequest::beginResponse(int code, const String& contentType, const uint8_t* content, size_t len, AwsTemplateProcessor callback) {
return new AsyncProgmemResponse(code, contentType, content, len, callback);
}
Can you please have a deeper look at the method signatures used like this example ?
Thanks!
Can you please confirm: you are then using esphome/ESPAsyncTCP-esphome @ 2.0.0 in both your tests and you are then just swapping esphome/ESPAsyncWebServer with mathieucarbou/ESPAsyncWebServer, right ?
From my understanding this comes with the Webserver, in my ´platformio.inithere is no extra point for this. The only other dependency I can see is
https://github.com/me-no-dev/ESPAsyncUDP` which is used for NTP.
The Async lib behind stays the same, but just the ESPAsyncWebServer changes ?
I only change line 29 in my platformio.ini
which points to the AsyncWebserver repositiory.
Also, if I look at the traces, you are using SSE, not WebSocket, right ?
Not completly shure what you mean, let my discribe how it's done in Ahoy: Almost all pages are static html which loads the data dynamically using AJAX. Only the webconsole is using a websocket.
We are both not using ESP8266 on a daily basis so it would help a lot if you had the opportunity to create a minimal reproductible test case in an .ino file we could add to the project.
I can try to do so - give me some time - I don't want to waste too much time in ESP8266 (as you also mentioned 😉)
9k of heap is definitely too low to work reliably even with a single connection. Running on the edge its is just a matter of time when you'll hit the out of mem issue. If your project if so memory stressed then I would not target for 8266 at all. Sorry, but this chip is too old to invest considerable time to optimize the code for such a limited conditions. This is just my opinion.
full ack - the ESP8266 was the chip where I started at and somehow it is possible to run the most recent software of Ahoy on it, but sadly not with this fork. It's not high prio for me but anyway it would be cool if it is supported. I know that the memory is too low on ESP8266, but this by design, the chip has not more 😉. Web applications alwasys become really big once they need to be nice.
@lumapu : you would need to measure the free heap just before the allocation requests that are failing (line see see in the stack traces ).
correct, it's measured and stored until it's transfered to WebUI by JSON-API
@lumapu : could you please walk me through your project and tell me exactly which API ou are calling, with which kind of data when it fails ?
That's not that easy. I random click on different menu items in the WebUI and from time to time it crashes. I can try to do a screen video to describe better.
I guess it all start here, but please be more specific. https://github.com/lumapu/ahoy/blob/main/src/web/web.h
I am willing to help more, but the lack of information you give is not helping ;-)
I'm sorry for that - I will help as much as I can. You guys are that fast - I really apreciate it. I was talking about the development branch, which is more than 200 commits apart from main: https://github.com/lumapu/ahoy/tree/development03
Specifically, what I am searching for, is if a change in method signature regarding PROGMEM usage could have made it so that the content is now not read from flash but loaded into ram.
Maybe this line: https://github.com/lumapu/ahoy/blob/83b386deda9a25ed5279b1efb720b52d33859aef/src/web/web.h#L378
So I need to know what exactly you are using for the ESpAsync API.
As I understand right now, your html pages are generated with a python script and their type is
const uint8_t {}[] PROGMEM
right ?
yes that's correct. The python script is used to do some preprocessor and translation things. Also some generic content like menu and footer are included.
And you are using
beginResponse_P
to serve them ?
yes, I think so: https://github.com/lumapu/ahoy/blob/83b386deda9a25ed5279b1efb720b52d33859aef/src/web/web.h#L248
So the method which is called is:
AsyncWebServerResponse *beginResponse_P(int code, const String& contentType, const uint8_t * content, size_t len, AwsTemplateProcessor callback=nullptr);`
which is implemented in ESPHome fork and original repo as:
AsyncWebServerResponse * AsyncWebServerRequest::beginResponse_P(int code, const String& contentType, const uint8_t * content, size_t len, AwsTemplateProcessor callback){ return new AsyncProgmemResponse(code, contentType, content, len, callback); }
In our repo, this method is deprecated and redirected:
[[deprecated("Replaced by beginResponse(...)")]] AsyncWebServerResponse* beginResponse_P(int code, const String& contentType, const uint8_t* content, size_t len, AwsTemplateProcessor callback = nullptr) { return beginResponse(code, contentType, content, len, callback); }
and goes to:
AsyncWebServerResponse* AsyncWebServerRequest::beginResponse(int code, const String& contentType, const uint8_t* content, size_t len, AwsTemplateProcessor callback) { return new AsyncProgmemResponse(code, contentType, content, len, callback); }
yes, I was notified by the deprecation and renamed the beginResponse_P
calls to beginResponse
. Maybe I missed something around this change. Do I need to change anything else than the function name?
Thank you for all your efforts, it feels really professional here
Do I need to change anything else than the function name?
No... Just changing the name is enough. This is the same signature and implementation behind like explained.
I started another (private) project using this AsyncWebserver again. This project does not include websockets for now. Even if I request pages on a high frequency no crash was seen so far. I will further try to dig around this to get better information.
The behavior feels the same as described in newer issue:
@lumapu : ws implementation in this fork is relying on the std::shared_ptr<std::vector<uint8_t>>
mechanism from youbox-node fork which is not in original repo and esphome fork... Maybe a lead ?
@mathieucarbou I did not use the youbox-node-fork for a long time. Thanks for the hint - I think I can easily switch for a test the Webserver library to youbox-node and see if the issue is still there.
As a subscriber of esphome fork I heard about the following, maybe it could be related to my problem or at least an improvment:
You are using SSE ?
@mathieucarbou I did not use the youbox-node-fork for a long time. Thanks for the hint - I think I can easily switch for a test the Webserver library to youbox-node and see if the issue is still there.
As a subscriber of esphome fork I heard about the following, maybe it could be related to my problem or at least an improvment:
@lumapu I have included this patch in this version => v3.2.3
Hi @lumapu ,
In latest version, I fixed an issue in the method overload for ESP8266 (regarding the PGM
usage).
You were using the methods with const uint8_t* content
, not char*, so I guess this fix won't help much, but I wanted to drop a note just in case ;-)
Hello,
I've just fixed a bug regarding string usages for ESP8266 (long time bug):
https://github.com/mathieucarbou/ESPAsyncWebServer/releases/tag/v3.3.17
If possible, please let me know if it solves the issue....
Please make sure to go through the recommendations before opening a bug report:
https://github.com/mathieucarbou/ESPAsyncWebServer?tab=readme-ov-file#important-recommendations
done, set
-D CONFIG_ASYNC_TCP_STACK_SIZE=4096
without any changeDescription
I'm the maintainer and developer of AhoyDTU https://github.com/lumapu/ahoy. This project has some configuration pages, which are mostly communicating using AJAX. As some of the users mention that the ESP8266 is really unstable with other forks of the AsyncWebserver, I wanted to try this fork.
To produce the issue you simply have to click in the menu 2-3 times on a high frequency to get the ESP crashed.
Board
ESP8266 Wroom
Ethernet adapter used ?
no
Stack trace
I see two different behaviors:
Trace 1
``` 0x4022cc39 in std::_Function_handlerTrace 2
``` 0x4023ff20 in operator new(unsigned int) at ??:? 0x4022f6d2 in AsyncWebHeader& std::__cxx11::listAdditional notes
I'd like to switch to this AsyncWebserver in future. As my project is multiplatform I already tested with success on ESP32. For now I use the esphome fork, but there the ESP8266 also feels unstable (that is reported by many users).