letscontrolit / ESPEasy

Easy MultiSensor device based on ESP8266/ESP32
http://www.espeasy.com
Other
3.27k stars 2.21k forks source link

Web pages not loading right on the last couple of versions, maybe a cpu or memory issue? #908

Closed mattlward closed 6 years ago

mattlward commented 6 years ago

NOTE: This is not a support forum! For questions and support go here: https://www.letscontrolit.com/forum/viewforum.php?f=1

Steps to reproduce

Visit the Hardware or Advanced pages

Does the problem presist after powering off and on? (just resetting isnt enough sometimes)

Yes

Expected behavior

Proper menu pages should show.

Actual behavior

Hardware page: image

Advanced page: image

Debug 0 seems to fix the issue?

System configuration

Hardware: Wemos D1, SHT30 and 1306 OLED Load: | 8% (LC=16072) |   | Free Mem: | 12720 (2712 - sendWebPageChunkedData)

Software or git version: v2.0-20180220

TD-er commented 6 years ago

Is this a build made by yourself? This looks like Arduino ESP8266 core library version 2.4.0 is being used.

mattlward commented 6 years ago

No, downloaded from the nightly build link.

https://github.com/letscontrolit/ESPEasy/releases

I have not tried to figure out how to build yet.

mattlward commented 6 years ago

How do you tell what core? This issue has been around for the last couple of nightly builds.

TD-er commented 6 years ago

Those hex characters in the webinterface indicate some web chunk errors you also see when running the 2.0 branch with the newer core lib. @psy0rz should look at the build script, since it probably uses the wrong platformio.ini file.

uzi18 commented 6 years ago

On my own build with arduino-builder and sdk 2.3.0 I'm unable to open /advenced subpage and sometimes also /hardware.

Grovkillen commented 6 years ago

Yes, I've noticed this with the PUYA file.

uzi18 commented 6 years ago

Did you tested my patch, to use fix only on puya chips? It will test what is chip manufacturer id and if it is puya use Igors workaround, when store data in flash.

mattlward commented 6 years ago

If this helps:

ESP Chip ID: | 6580584 ESP Chip Freq: | 80 MHz Flash Chip ID: | Vendor: 0xEF Device: 0x4016 Flash Chip Real Size: | 4096 kB Flash IDE Size: | 4096 kB Flash IDE speed: | 40 MHz Flash IDE mode: | DIO

s0170071 commented 6 years ago

If I remember it correctly I've had those chunking characters once while testing (before the 2k-webserver ) and I am pretty sure you were out of memory. If your lowest memory shows is 2710 bytes, that figure does not include the memory required by the core to send out wifi packets. The core usually requires around 3k of additional ram. If you like you can try PR #850 and see if that works for you.

mattlward commented 6 years ago

Is that directed at me? Read the thread, what do you want tried?

TD-er commented 6 years ago

@mattlward If you look at your screenshot, you'll see the free memory and next to it the lowest amount of free memory when a certain function is executed:

12720 (2712 - sendWebPageChunkedData)

That's about 2.7k free when sending the web page.

That's what @s0170071 is talking about. He wrote a quite elaborate (and very nice) patch to reduce the memory usage for the webserver part. ("2k-webserver") That can be a solution. But we should also look into why you have so little free memory in the first place.

mattlward commented 6 years ago

The unit in question has an sht30, framed oled, sniff and wemos D1. I have 3 sets of rules, 1 that passes info out to mqtt at boot, 1 that processes the input from the sniff process to allow for remote reboots and ip address checks and one that controls the output level of the display based on time.

The memory usage floats all over the place... For example right after remote login: Load: | 9% (LC=16019) |   | Free Mem: | 14648 (2240 - sendWebPageChunkedData)

Is there a way to get that data into mqtt? If so, I would be happy to graph it in HA.

TD-er commented 6 years ago

You can add a System Info device which reports on memory usage. That can be sent to the controller of choice.

mattlward commented 6 years ago

I will do that... I set the sniff and display to not enabled and the memory pool did not increase. Not sure if that unloads them or not.

mattlward commented 6 years ago

I will have the data collecting today.

mattlward commented 6 years ago

It is collecting data at 1 minute intervals, both free memory and load.

TD-er commented 6 years ago

I've created a build including the 2k-webserver pull request. It is based on the latest Mega branch. You can also try these files: https://www.dropbox.com/s/sbkib3mrba5qdo2/2k_webserver_build_mega_20180220.rar?dl=0

mattlward commented 6 years ago

I will let it collect data overnight and load that in the morning...

mattlward commented 6 years ago

So, I loaded a few of them on a test unit with only a DHT, no other stuff. It does fix the MQTT case issue when the rules publish! I also now have all 12 tasks loaded on 1 screen.

This runs but spits out error in the info panel. Build | 20100 - Mega (core 2_4_0) Version: dev_ESP8266_4096_core2_4_0.bin GIT version |   Plugins | 72 [Normal] [Testing] [Development] Build Md5 | 4d44355f4d44355f4d44355f4d44355f Md5 check | fail ! Build time | Feb 20 2018 22:34:55

Load | 5% (LC=19196) Free Mem | 14520 (9704 - duringHeaderTX)

This load gave me the following: dev_ESP8266_4096.bin Load: | 3% (LC=15834) |   | Free Mem: | 15176 (14608 - sendContentBlocking)

Build | 20100 - Mega (core 2_3_0) GIT version |   Plugins | 72 [Normal] [Testing] [Development] Build Md5 | 4d44355f4d44355f4d44355f4d44355f Md5 check | fail ! Build time | Feb 20 2018 22:33:25

No big increase in freemem on either and the second was much slower to draw web pages.

s0170071 commented 6 years ago

@mattlward Don't worry about the MD5 check, @TD-er probably did not include the checksum in the binary. Core 2.4.0 is faster, thats a fact :-) Freemem is not giving you any trouble, its the figure thats in the braces. It indicates the lowest memory and the function in which it occured. That figure increased from 2712 to 14608. Thats superb !

@TD-er Build Md5 | 4d44355f4d44355f4d44355f4d44355f is the dummy string. Should I show the run-time calculated MD5 instead ? Or even show a message that this is the dummy.

TD-er commented 6 years ago

Just show something like "not set", but a little more elaborate;)

mattlward commented 6 years ago

So, on the original unit it appears that the 2.4.0 core load uses more memory but the free has not been below 8800. Just got it loaded and will watch it for awhile. Pages do seem to load properly.

In the future, in order to get the latest fixes I have been staying up mostly with the nightly builds... Should I be doing that?

mattlward commented 6 years ago

This is freemem and load. Notice that the freemem dropped a bunch after loading the new version.

image

s0170071 commented 6 years ago

@mattlward how often does the free memory chart get updated ? Its rather unusual that memory drops so linearily. Looks a bit like there have been no updates between about 0:00 and 7:00 At what point in time did you load what new version ? That sharp drop at the end may have been you accessing the web page....

mattlward commented 6 years ago

That is what I am suspecting, the old version of code may have crashed and was not outputting data. It is on a 30 second cycle. But there were small changes during the night, It may have been that one large drop has the scale so skewed that the tiny ripples do not show. The variances were 5 to 10 bytes.

It has had one more large memory hit, I am working it hard in the user interface to try and affect it.

Free Mem: | 12696 (8728 - duringDataTX)

uzi18 commented 6 years ago

Have also informations if one set admin password, login page is malformed, so impossible to input password. But you can log in by put adress in browser /login?password=yourpass Maybe related?

mattlward commented 6 years ago

I use passwords on most of mine, never had a corruption on the entry page.

On Feb 21, 2018 2:14 PM, "Bartłomiej Zimoń" notifications@github.com wrote:

Have also informations if one set admin password, login page is malformed only. But you can log in by put adress in browser /login?password=yourpass Maybe related?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/letscontrolit/ESPEasy/issues/908#issuecomment-367456672, or mute the thread https://github.com/notifications/unsubscribe-auth/AIJPqi3gWCqy4kEAMytC16LKhVy_klrSks5tXHkXgaJpZM4SMBJS .

mattlward commented 6 years ago

@TD-er, here is what my freemem and sysload have done after going to the 2.4.0 build you linked... The fluctuations are the start of running 2.4.0 and the poll interval is 30 seconds.

image

mattlward commented 6 years ago

One more update... I found the loss of memory very odd for hours with nothing going on... 0:00 - 04:00 and the CPU spike at around 05:45... Running 2.4.0 core Mega

12696 (7936 - duringDataTX)

image

s0170071 commented 6 years ago

Could there have been incoming network traffic? Router reboot or DSL reconnect?

mattlward commented 6 years ago

Not at that time, in that building. I have checked device logs and see no odd logging on the router. It is a 1 gig, fiber connected Cisco 4500X. We also did not have any maintenance on our large wireless network this AM. More or less 15k access points in service, all Aruba front and back ended.

s0170071 commented 6 years ago

Is the Esp exposed to the internet?

mattlward commented 6 years ago

It is and it is running a password.

mattlward commented 6 years ago

Indirectly exposed... behind very large firewalls and NAT translation.

s0170071 commented 6 years ago

Well, that password doesn't protect from incoming traffic. Running a DOS attack on the Esp can be done with a toy computer. All incoming packets are buffered in that tiny ram. And the buffer is serviced only 100 times (not sure) per second. So give it 1000 packets per second and it's down before the espeasy firmware even gets to the packets.

mattlward commented 6 years ago

I realize that... Our IDP/IDS and firewall shield us from outside attacks on our wireless devices. The devices are not directly reachable from the internet, they are however open to a degree to our intranet. Example, to reach this device from home I must VPN into our campus and jump from a secure system to the device. So, by exposed I mean that it has outbound internet and accepts responses on the NAT'ed port.

I understand that it is not ideal, but it is what it is. I hope the expectation is not that these devices will always be on a private network. Although it would be great if they supported some form of WPA for at least encryption in the air and https for establishing connections. They are weak enough and an odd enough operating system that I would not expect them to be general targets for bot deploys either.

s0170071 commented 6 years ago

Alright. I think that's ok. Did you consider to have your data logged by thingspeak? It's free and there are plenty of apps to display it on a mobile phone.

mattlward commented 6 years ago

Just logging directly to HA... I also just enabled syslog on this device. I like to stay away from clouds.

mattlward commented 6 years ago

Interesting... syslog is not logging in this version... Will try at home with other versions. I do not want to hang the unit I am testing on.

s0170071 commented 6 years ago

@mattlward: do the latest builds fix this issue ?

mattlward commented 6 years ago

I do not know, loaded it on they unit I have been testing with and cannot login. This is all I get on the screen after the reboot. I will have to go get the unit and troubleshoot it.

Loaded Release mega-20180224

image

TD-er commented 6 years ago

That's a known issue, see https://github.com/letscontrolit/ESPEasy/issues/938 It should have been fixed in the latest sources. However last night, no build was produced due to build errors. You could build one yourself?

mattlward commented 6 years ago

Is it possible to craft an HTTP put with password and new file for a downgrade?

mattlward commented 6 years ago

I am not set up at work to build... BTW, this did not work /login?password=something

mattlward commented 6 years ago

Got in, used IE compatibility mode and went way back and got the password prompt.

TD-er commented 6 years ago

But in that build the new webserver is not yet included. That was merged only yesterday.

mattlward commented 6 years ago

Syslog did not work in this version either. Backing it down to get normal access.

Back to Release mega-20180223

mattlward commented 6 years ago

Same issue in 223, but login?password= still works fine.

TD-er commented 6 years ago

This one can be closed, right?

mattlward commented 6 years ago

I say yes.

On Mar 8, 2018 6:16 PM, "Gijs Noorlander" notifications@github.com wrote:

This one can be closed, right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/letscontrolit/ESPEasy/issues/908#issuecomment-371670104, or mute the thread https://github.com/notifications/unsubscribe-auth/AIJPqn8nIX_XmrV8PEXxZLiGia_JK29yks5tccnQgaJpZM4SMBJS .