letscontrolit / ESPEasy

Easy MultiSensor device based on ESP8266/ESP32
http://www.espeasy.com
Other
3.26k stars 2.21k forks source link

NaN handling in ESPeasy #2340

Open Sasch600xt opened 5 years ago

Sasch600xt commented 5 years ago

Hello friends :)

i have too many random reboots per day.

i did try latest 5 firware versions.

older firmware without MCP23017 running since 90 days without a reboot. But i need newer firmware for monitoring mcpgpio.

Thank you for your help :)

I use OpenHAB MQTT as controller

Here is my Hardware: img_20190220_112804_3

And here is a log of Uptime minutes: uptime

So as you see i use two mcp23017

Here is the config file: config.zip

And here are the rules:

on System#Boot do
monitor,mcp,17
monitor,mcp,18
monitor,mcp,19
monitor,mcp,20
monitor,mcp,21
monitor,mcp,22
monitor,mcp,23
monitor,mcp,24
monitor,mcp,25
monitor,mcp,26
monitor,mcp,27
monitor,mcp,28
monitor,mcp,29
monitor,mcp,30
monitor,mcp,31
monitor,mcp,32
mcpgpio,1,0
mcpgpio,2,0
mcpgpio,3,0
mcpgpio,4,0
mcpgpio,5,0
mcpgpio,6,0
mcpgpio,7,0
mcpgpio,8,0
timerSet,2,20
timerSet,3,10
endon

on mcp#17 do
Publish,%sysname%/MCP-GPIO/Input-17,[plugin#mcpgpio#pinstate#17]
endon

on mcp#18 do
Publish,%sysname%/MCP-GPIO/Input-18,[plugin#mcpgpio#pinstate#18]
endon

on mcp#19 do
Publish,%sysname%/MCP-GPIO/Input-19,[plugin#mcpgpio#pinstate#19]
endon

on mcp#20 do
Publish,%sysname%/MCP-GPIO/Input-20,[plugin#mcpgpio#pinstate#20]
endon

on mcp#21 do
Publish,%sysname%/MCP-GPIO/Input-21,[plugin#mcpgpio#pinstate#21]
endon

on mcp#22 do
Publish,%sysname%/MCP-GPIO/Input-22,[plugin#mcpgpio#pinstate#22]
endon

on mcp#23 do
Publish,%sysname%/MCP-GPIO/Input-23,[plugin#mcpgpio#pinstate#23]
endon

on mcp#24 do
Publish,%sysname%/MCP-GPIO/Input-24,[plugin#mcpgpio#pinstate#24]
endon

On Rules#Timer=2 do
Publish,%sysname%/status/Build_date, %sysbuild_date%
Publish,%sysname%/status/Ip_Adresse, %ip%
timerSet,2,3600
endon

on mcp#25 do
Publish,%sysname%/MCP-GPIO/Input-25,[plugin#mcpgpio#pinstate#25]
endon

on mcp#26 do
Publish,%sysname%/MCP-GPIO/Input-26,[plugin#mcpgpio#pinstate#26]
endon

on mcp#27 do
Publish,%sysname%/MCP-GPIO/Input-27,[plugin#mcpgpio#pinstate#27]
endon

on mcp#28 do
Publish,%sysname%/MCP-GPIO/Input-28,[plugin#mcpgpio#pinstate#28]
endon

on mcp#29 do
Publish,%sysname%/MCP-GPIO/Input-29,[plugin#mcpgpio#pinstate#29]
endon

on mcp#30 do
Publish,%sysname%/MCP-GPIO/Input-30,[plugin#mcpgpio#pinstate#30]
endon

on mcp#31 do
Publish,%sysname%/MCP-GPIO/Input-31,[plugin#mcpgpio#pinstate#31]
endon

on mcp#32 do
Publish,%sysname%/MCP-GPIO/Input-32,[plugin#mcpgpio#pinstate#32]
endon

On Rules#Timer=3 do
Publish,%sysname%/MCP-GPIO/Input-17,[plugin#mcpgpio#pinstate#17]
Publish,%sysname%/MCP-GPIO/Input-18,[plugin#mcpgpio#pinstate#18]
Publish,%sysname%/MCP-GPIO/Input-19,[plugin#mcpgpio#pinstate#19]
Publish,%sysname%/MCP-GPIO/Input-20,[plugin#mcpgpio#pinstate#20]
Publish,%sysname%/MCP-GPIO/Input-21,[plugin#mcpgpio#pinstate#21]
Publish,%sysname%/MCP-GPIO/Input-22,[plugin#mcpgpio#pinstate#22]
Publish,%sysname%/MCP-GPIO/Input-23,[plugin#mcpgpio#pinstate#23]
Publish,%sysname%/MCP-GPIO/Input-24,[plugin#mcpgpio#pinstate#24]
Publish,%sysname%/MCP-GPIO/Input-25,[plugin#mcpgpio#pinstate#25]
Publish,%sysname%/MCP-GPIO/Input-26,[plugin#mcpgpio#pinstate#26]
Publish,%sysname%/MCP-GPIO/Input-26,[plugin#mcpgpio#pinstate#27]
Publish,%sysname%/MCP-GPIO/Input-28,[plugin#mcpgpio#pinstate#28]
Publish,%sysname%/MCP-GPIO/Input-29,[plugin#mcpgpio#pinstate#29]
Publish,%sysname%/MCP-GPIO/Input-30,[plugin#mcpgpio#pinstate#30]
Publish,%sysname%/MCP-GPIO/Input-31,[plugin#mcpgpio#pinstate#31]
Publish,%sysname%/MCP-GPIO/Input-32,[plugin#mcpgpio#pinstate#32]
endon

Have a great day Sascha

giig1967g commented 5 years ago

Hi @Sasch600xt I susect that your rules use too much stack. What is the free stack on the main page?

Sasch600xt commented 5 years ago

it says: Free Stack: | 3600 (1580 - sendContentBlocking)

So do i run out of memory here ?

giig1967g commented 5 years ago

doesn't seem so. Maybe it runs out of memory during Timer3 or if you receive many MCP updates at the same time

Sasch600xt commented 5 years ago

darn,i see my fault.....actually timer 3 should run only once delayed at systemstart so i change that and we see how it works

giig1967g commented 5 years ago

in your rules, timer3 runs only once. from your graph it seems that reboots happens just after a system start, so I made a guess that the huge publish list could be one of the problems.

Sasch600xt commented 5 years ago

no, it reboots pretty random. it is not connected with actions i send to the esp.

and it is not after a systemstart. systemstart always works good. then after some random time it starts rebooting once and the circle of life starts over again :)

I search since 10 days for the problem and i can´t find it :(

TD-er commented 5 years ago

Can you backup your rules/settings (or recreate the settings, but keep the rules) and perform a full erase of the flash + reflash the firmware? Just to be sure no erratic behavior is caused by some faulty setting somewhere.

Sasch600xt commented 5 years ago

oh thats what i do always when i run into problems. i start from scratch. So i did that already (a few times)

giig1967g commented 5 years ago

does it happen often that many MCP inputs are triggered at the same time? I suspect that if too many triggers are fired at the same time the unit could freeze

Sasch600xt commented 5 years ago

not at all......at the moment only 2 mcp inputs are active. They are working as feedback for 2 relays. maybe 10 times a day. And i doubleckecked when the unit reboots there is no MCP action at this moment. So it is very confusing for me. But thank you so much for helping me. it is always good to have more then one brain thinking about it :)

Sasch600xt commented 5 years ago

and a big problem is i loos the temperature sensors from time to time......i changed ESP, sensor , gpio and cable. So i am not sure what else can i do.

temp

Sasch600xt commented 5 years ago

i used D1 and D2 for SDA/SCL Can this be the Problem ? Is there specific ports i should use for SDA/SCL ?

TD-er commented 5 years ago

The default ports used by most boards is SDA => D2, SCL => D1. So that's also the default config in ESPeasy.

That DHTxxx plugin is one I have done some tweaking on a while ago, mainly for the version used by Sonoff in their TH10/16 sensors and that was mainly a timing issue fixed then (for that single sensor) so maybe the other sensors in that plugin may also need some tweaking? Also the DHTxxx is a sensor mentioned in a lot of examples, since you need to disable interrupts to properly read it. But I think a plugin should be made to allow a re-read when it encounters a NaN value. Those can happen every now and then, but the plugin should recover from it (and it shouldn't be happening too often)

Sasch600xt commented 5 years ago

at the moment when it shows "NaN" it never recovers untill i toggle power supply. i use dht22

i see right now i have swaped the ports. So in my case SDA = D1 and SCL = D2 Can this couses trouble ?

TD-er commented 5 years ago

Can this couses trouble ?

I don't think so. On various sites both GPIO-4 and -5 are mentioned as the most favorable pins for GPIO activity. As far as I know, they do not have some hardware supported I2C.

When it shows "NaN", you may try to just save the plugin settings and see if it will recover. If it does, then this state can be 'fixed' in software to just call the same code as used in the init function.

Sasch600xt commented 5 years ago

i did try that already, saved plugin new, deleted plugin and activated again. Nothing helps untill i power cycle the DHT22 sensor. At the moment i do so by using a relay to remotly disconnect VCC from Sensore and 10 seconds later i restore VCC again by relay. Not nice, but at least it works. Down side is i need all 8 relays from relayboard for other applications, and now i have only 7 left.

Sasch600xt commented 5 years ago

just an idea:

i could power supply the DHT22 from a gpio because it only takes 2.5mA. So how could be a rule look like: if dht22 shows "NaN" or "nan" gpio 14,0 wait 10 seconds gpio 14,1

if this would be working, would be great for the moment

TD-er commented 5 years ago

For a first attempt you can try to just do it periodically to see what happens. So just use the timer. Not sure if we can detect NaN in a rule.

On System#Boot do //This will happen at boot ESP8266
   timerSet,1,30  //Set and start timer 1 at 30 seconds
   gpio14,1
endon

On Rules#Timer=1 do   //When the timer 1 is up:
   gpio14,0
  timerSet,2,15                // Go to timer2 in 15 sec.
endon

On Rules#Timer=2 do   //When the timer 1 is up:
   gpio14,1
  timerSet,1,900                // Set timer 1 for 15 minutes.
endon
Sasch600xt commented 5 years ago

well, how can we find out we can use "NaN" for rules ?

Grovkillen commented 5 years ago

I guess isnan would be a good idea.

TD-er commented 5 years ago

Checking for NaN in C++ is very interesting by the way ;)

bool isNan(float value) {
  return value != value;
}

I guess it would make a great addition to the rules, to have such a check or event to match.

Domosapiens commented 5 years ago

Yep! Because what happens if in a subtraction one of the values is NaN ???

Domosapiens commented 5 years ago

Great addition to rules ... also to count the Nan's, to get an idea of the occurrence rate. A quality check for sensors, cables and SW !

Grovkillen commented 5 years ago

We could trigger an event Task#Value#IsNaN and the user can decide for themselves what to do with that.

TD-er commented 5 years ago

Some plugins already have such a statistics (shown in the setup page of that plugin). But I agree that a more generic approach may be useful. We can do it at several levels:

Grovkillen commented 5 years ago

Maybe a generic plugin event appender for values? So each plugin can append an event string like this:

Task#Value#AppendixEvent

jimmys01 commented 5 years ago

In my rules I detet that the HTU is not working by using if [Enviroment#Temperature]=0.00

Grovkillen commented 5 years ago

That works but it's not given that 0.0 is a faulty value for all plugins. As an example the DS18B20 used 85.0 to report error. So a generic event would be the best. But let's think about it.

Sasch600xt commented 5 years ago

sounds great !

TD-er commented 5 years ago

This NaN detection is also an issue for controllers. Some controllers (not sure if all do it) do not send data when the value (or one of the values) is NaN. I can imagine it will be very useful too if it could be enabled/disabled per controller Also it may lead to undefined behavior when used for computations. For example when used in a formula or to do computations in rules (even comparing may be tricky).

I will change the topic title, since it is already not about the reboots anymore ;) (and we have enough reboot issues ...)

Sasch600xt commented 5 years ago

i agree