letscontrolit / ESPEasy

Easy MultiSensor device based on ESP8266/ESP32
http://www.espeasy.com
Other
3.28k stars 2.22k forks source link

mega-20190116 causes missing mhz19 co2values #2254

Closed pwassink closed 5 years ago

pwassink commented 5 years ago

version mega-20190116

After updating to mega-20190116 i have several different esp/types of nodes who stop sending the C02 values from the MHZ19 co2 sensor, wifi connection is still there, other sensors on same esp still work fine. data destination is Domoticz, data does not arrive there so it must be a local (esp) problem. I see the same symptoms on 4 different nodes here.

Logfile content shows this:
MHZ19: Error, timeout while trying to read MHZ19: Unknown response: 0 0 0 0 0 0 0 0 0

Anyone with same behaviour ?

TD-er commented 5 years ago

Looks to be related to the change in serial I made recently.

What gpio pins do you use?

pwassink commented 5 years ago

It might be, lets see, I guess you have found a point Gijs :-)

esp01 = Gpio0 Gpio2 (no problems seen yet) esp02 = Gpio14 Gpio12 esp08 = Gpio14 Gpio12 esp18 = Gpio14 Gpio12

Peter

TD-er commented 5 years ago

I have two nodes here I can also test later this afternoon

pwassink commented 5 years ago

Ok, Fault is not immediate, they stop sending after a while, and are quite persisting in that then, reboot or sw downgrade is not enough to get them sending data again. They need a powercycle and some time between off and on.

TD-er commented 5 years ago

How long is "after a while" ? I have just setup a node using one of these sensors and also a BMP280 is connected. It is using GPIO-14 & -12 for the CO2 sensor.

pwassink commented 5 years ago

a couple of hours, yesterday everything was working, this morning i had 3 out of 4 with same status.

TD-er commented 5 years ago

And they all booted around the same time, before they all stopped working?

pwassink commented 5 years ago

yes, all were restarted/updated within an hour i think, was updating them yesterday evening. and during the night they stopped working

ristomatti commented 5 years ago

I was just about to update my nodes but luckily noticed this. @TD-er which would be the latest release before the changes to Serial you're guessing might be the culprit? Or would it be more useful if I flashed the newest version to see if I'll experience the same issue?

TD-er commented 5 years ago

If any missing updates are not a big deal, I would suggest the latest version. To keep on the version before the serial port wrapper, you could use 20181231. (I think I committed the changes this year)

pwassink commented 5 years ago

i did downgrade the 3 "problem" esp's to 20181231 yesterday, can confirm they are still sending data with no hardware changes, now for 24h so those three esp's still using gpio12 and gpio14 in my case

Any luck with recreating it Gijs ?

TD-er commented 5 years ago

Nope, my node is still sending values and it is running the build from January 16th. (core 2.5.0)

pwassink commented 5 years ago

i used the ESP_Easy_mega-20190116_normal_core_241_ESP8266_4096.bin on those nodes, that has the 2.4.1 core i believe ?

TD-er commented 5 years ago

@pwassink That's correct. You can also see the core build in the sysinfo page.

pwassink commented 5 years ago

I did install that version ESP_Easy_mega-20190116_normal_core_241_ESP8266_4096.bin back to the esp02 here, lets see if it happens again

pwassink commented 5 years ago

Hmm, Maybe the day of the week or the position of the moon has a certain influence, I did not see the fault reoccurring here within the past days.

TD-er commented 5 years ago

It seems to be working fine here too. It was just about full moon when you reported it, so I guess that should have been it ;)

pwassink commented 5 years ago

Update,

i left esp02 running with the ESP_Easy_mega-20190116_normal_core_241_ESP8266_4096.bin version that is still running as it should.

Yesterday i did update the rest of my nodes to ESP_Easy_mega-20190121_normal_core_241_ESP8266_4096.bin

And esp-18 went bozo again at 06:55 local time, all is functioning as it should, except the MHZ19 Co2 sensor, in the devices tab it is displaying the "last-known-good value" from approx 07:00 and again with the same entries in the Logfile : MHZ19: Error, timeout while trying to read MHZ19: Unknown response: 0 0 0 0 0 0 0 0 0

Esp-18 is using Gpio 14 / 12 it is a Nodemcu with Ch340 version 2 of 3 from ali Remote reboot via web console did not solve the issue again it is displaying those entries mentioned above several times After powercycle 78640: MHZ19: Error, timeout while trying to read 78641: MHZ19: Read error: checksum = 202 / 0 bytes read => 255/168/12/161/226/255/0/0/0/ 78643: MHZ19: Shifted 1 bytes to attempt to fix buffer alignment After some more time : 255652: MHZ19: PPM value: 1584 Temp/S/U values: 25/0/0.00 255656: EVENT: Rik-CO2#PPM=1584.00 255656: EVENT: Rik-CO2#PPM=1584.00 Processing time:1 milliSeconds 255659: EVENT: Rik-CO2#Temperature=25.00 255659: EVENT: Rik-CO2#Temperature=25.00 Processing time:1 milliSeconds 255662: EVENT: Rik-CO2#U=0.00 255662: EVENT: Rik-CO2#U=0.00 Processing time:1 milliSeconds

It seems working again now, but needed the powercycle about 2 minutes no power

TD-er commented 5 years ago

Does it need to do something at that moment? Can be something which could lead to a hickup of the wifi on the ESP?

pwassink commented 5 years ago

Nothing at all,

There are no scheduled resets, reboots, backups, wfi resets, dhcp cleanups or any external reason at that time

And the wifi keeps working fine it is just the readout of the mhz19 that stops, all other (i2c based) sensors on same esp still work

This morning at 03:00 it happened again with Esp-18 running the 0121 Mega version, Same errors in esp-logfile as before, other sensors work fine

And at 19:05 the esp02 running with the ESP_Easy_mega20190116_normal_core_241_ESP8266_4096.bin also crashed, same situation as above, no data anymore and same entries in logging.

pwassink commented 5 years ago

Again 2 hickups today, esp02 and esp18 both went wrong again same faults in the log Software 2 versions involved, 0116 and 0121 both 2.4.1 core and 4 Mb version hardware of esp02 is a nodemcu-d1-mini / esp18 is a nodemcu-v2 both use gpio12/gpio14

TD-er commented 5 years ago

Can you try this build on one of your nodes?

ESP_Easy_mega-20181109_normal_ESP8266_4096.bin

It is running on one of my nodes with a MH-Z19 and has now an uptime of 54 days 23 hours 21 minutes That's since a power outage... This one is running fine with regular updates of the CO2 value.

 

pwassink commented 5 years ago

i will,

but have i already mentioned before that up till version mega 20181231 core 2.4.1 normal 4Mb none of the reported stability issues have been seen, so why this specific version ?

i will download that specific version en install it to eesp02 and esp18 co2-measuring nodes, but the problems appeared in the mega-2019* range not earlier in my humble opinion Gijs ..

TD-er commented 5 years ago

OK, then it is of no use indeed.

That specific version is running on a version I have which appears to be (sadly unusual) stable. And you're right, if it was running fine until 20181231, then it is not really worth to try the older ones.

pwassink commented 5 years ago

Nevertheless, Esp02 and esp18 are running on ESP_Easy_mega-20181109_normal_ESP8266_4096.bin now Esp01 and Esp08 are running on ESp-Easy_mega_20190121_normal_core_241_ESP8266_4096.bin

And we will see

Remark, my eye just catched that the 1109 version has build number 20102 so that is way back and different ?

pwassink commented 5 years ago

Gijs,

I might have found another clue on this issue.

a couple of minutes ago my Esp08 A nodemcu ch340V2 4Mb running ESp-Easy_mega_20190121_normal_core_241_ESP8266_4096.bin stopped sending co2data aslo.

Log snippet 370508579: UDP : 60:01:94:0F:AF:9D,192.168.3.46,17 370508785: UDP : 24:0A:C4:82:F2:B8,192.168.3.51,21 370509501: UDP : 60:01:94:0C:2E:FD,192.168.3.48,19 370514624: UDP : 5C:CF:7F:82:FD:47,192.168.3.31,2 370515674: MHZ19: Read error: checksum = 120 / 248 bytes read => 255/134/5/183/71/128/15/112/248/ 370515677: MHZ19: Shifted 1 bytes to attempt to fix buffer alignment 370517702: EVENT: Clock#Time=Wed,12:19 370517755: EVENT: Clock#Time=Wed,12:19 Processing time:53 milliSeconds 370520257: UDP : 60:01:94:0B:94:5C,192.168.3.30,1 370520460: UDP : 60:01:94:02:0E:E8,192.168.3.36,7 370523940: UDP : 18:FE:34:E2:18:84,192.168.3.35,6 370529880: UDP : BC:DD:C2:EA:3D:BC,192.168.3.54,24

This is the first time i've seen this fault, maybe ..

TD-er commented 5 years ago

A checksum error should be handled more gracefully. Maybe we should add some threshold on when to start shifting to search for a new good start. Or a better algorithm to get in sync again. As far as I know, there has been no change in the code for this in over a year. There has been only a change in the way the serial port connection is being created, so I really don't know why it leads to these issues.

pwassink commented 5 years ago

I have the idea that the mhz19 sensor itself enters a "blocked" state after a couple of hours miscommunication or as the result of not beeing able to get data out. ? Motivation: a reboot or even a full sw-update does not solve that status, the mhz19 has to be powerless for a minute or so to regain life, i found this behaviour when it is stuck in the status that looks like: MHZ19: Error, timeout while trying to read MHZ19: Unknown response: 0 0 0 0 0 0 0 0 0

The fault esp08 was this morning was still solvable with a remote reboot option from the web console so it was not the same as the complete lockup after several hours i've found and reported a couple of times.

But found this afterwards: at approx 1800 local time it went in to the blocked status, reset, reboot through console powercycle nothing worked this time, Device needed a re-flash with mega 20181231 4mb version then it came to life again. Esp08 is using Gpio12/14 combination and is a nodemcu v2 ch340 model too

TD-er commented 5 years ago

It sounds quite strange, since the plugin does send a command to gather new sensor data to the MH-Z19 So I don't get it why even a soft-reboot is not working. I will add also some stats to see how often a checksum error occurs (receiving end) If that happens a lot, it may also happen when sending data to the sensor and maybe the sensor is then put into some unknown command mode. Maybe some pull-up resistor configuration has changed when I changed the way how serial pins are configured???

pwassink commented 5 years ago

It might,

will look into it a bit further tomorrow or during the oncoming weekend perhaps i will be able to find something, is that sensor not that widely used that no-one suffers identical behaviour ?

TD-er commented 5 years ago

Or not used by people running the latest builds

blb4github commented 5 years ago

I have 2 MH-Z19b, both running on 20190116 test build without issues

pwassink commented 5 years ago

Ok, just curious exactly which core-version of the test build and which Gpio's are you using ?

blb4github commented 5 years ago

I see 1 is running ESP_Easy_mega-20190116_test_core_250_beta_ESP8266_4096.bin, the other ESP_Easy_mega-20190121_test_core_250_beta_ESP8266_4096.bin. On both sensor connected to GPIO 13 & 12

pwassink commented 5 years ago

So that's another core and one other GPIO you are usin 13/12 instead of GPIO 14/12 i use here

Today i updated 4 out of 4 to the ESP_Easy_mega-20190202_normal_core_241_ESP8266_4096.bin version

lets see

TD-er commented 5 years ago

Now that I think of it, there may be another change in the used SWserial library since start of this year. We used to have our own version of the SWserial library, which I patched to use less iRAM memory. But now we are using the version included in the core library and for the core 2.5.0 this may be a newer version than before. Also there may be some change in the use of interrupts on low-speed connections. (9600 baud and lower) I must look into that too.

It still doesn't explain why the sensor apparently can get so messed-up it needs a power-down to act normal again.

pwassink commented 5 years ago

Within 3 hours after updating to ESP_Easy_mega-20190202_normal_core_241_ESP8266_4096.bin my esp18 was locked up again, webconsole reboot did not work, powerdown was required again. the other one is still working, will add inhere if it freezes the MHZ19 also

Yes, the other one esp18 stopped sending reasonable CO2 Data at 19:05 this evening, so fault is persistent in yesterdays mega 202 core 2.4.1 at least . That one is back and donwgraded to the 20181231 version now too.

Around midnight the third esp, esp02 froze up with the Co2 readings, also downgraded to 20181231 now

The only one still running on the ESP_Easy_mega-20190202_normal_core_241_ESP8266_4096.bin is the one that is using Gpio-0/Gpio-2, the other three who did freeze up are using Gpio14/gpio12. i don't think this is a coincidence anymore,

i will change esp02's wiring tot Gpio-0/Gpio-2 and upgrade it to the 0202 version core 2.4.1 again and we might see if that stays alive after this gpio-change. Done

TD-er commented 5 years ago

I just added a commit to do some improvement on the reading. See https://github.com/TD-er/ESPEasy/commit/3d507bdbb9dc6dd4aee2ce0c7ce6c809ea7dfb7c

Can you build a test version for it, or do you want me to build one?

pwassink commented 5 years ago

If you could provide me with a bin i will be happy to put it on both still on gpio12/14 running esp's and then it is certain the file is the same, no local influences etc ..

TD-er commented 5 years ago

OK, took some time, but here is the test build

pwassink commented 5 years ago

Got it,

Esp08 and esp18 are running on this testversion with the 2.4.1 core now, they both use gpio12/14

Lets see !

TD-er commented 5 years ago

You should also see some indicator showing the number of lines processed and the nr of CRC failures image

pwassink commented 5 years ago

Yes , i have them in the console too now!

will leave those 2 and see at certain interval if those counters increase. Keep you posted ..

Both sensors in use at the test nodes are B versions so that detection worked too

Checksum (pass/fail): | 11/0 -- 08 -- Detected: | MH-Z19B

Checksum (pass/fail): | 8/0 -- 18 -- Detected: | MH-Z19B

TD-er commented 5 years ago

Do you also have a mix of MH-Z19 A/B or only the newer B versions? It should not matter, just curious.

pwassink commented 5 years ago

Do you also have a mix of MH-Z19 A/B or only the newer B versions? It should not matter, just curious.

B versions, just added that info , detection worked too :-)

pwassink commented 5 years ago

small update, all are still running fine, the ones with your special version esp8 and 18 show increasing counters but still no faults or checksum errors, current values are:

Esp08: Checksum (pass/fail): | 1795/0 Esp18: Checksum (pass/fail): | 724/0

and the other one that failed quite consistent esp02 is also still running now with the changed Gpio's and thus di-mini has now the mhz19 on Gpio-0/Gpio-2 and mega 20190202 version core 2.4.1

pwassink commented 5 years ago

Bummer accidentally deleted the post here , try to recreate out of my memory :

Yesterday evening three of my esp's 02, 08 and 18 went red on Domoticz, no Co2 sensor data. two of them have the special version core 2.4.1 , and gpio 12/14. A web console reboot did not solve it, and only produced: MHZ19: Unknown response: 0 0 0 0 0 0 0 0 0 but esp18 co2 measurements gained life back after approx one hour, at 0:56 it started sending proper values again .

Esp08 has received a reboot too, and esp02 also (that one has gpio0/2 now same as esp01) both did not recover, and this morning both got a 2 minute power-down , after that both came back to life ok.

I now changed the software-version on esp02 to ESP_Easy_mega-20190202-58-PR_2235_normal_core_250_beta_ESP8266_4 so we can test that core-version too, the gpio chnage did not make a difference running the old (current 2.4.1) version became clear esp02 did show a single checksum failure today: Checksum (pass/fail): | 79/1

I also activated syslog for esp08 only in the beginning, with settings als below, if you want a specific setting please ask. syslogsettings_20190207

TD-er commented 5 years ago

Maybe you can also itemize your configs in the post (can also be next post, no need to edit this one) showing the following:

Itemized information is (for me) easier to follow compared to descriptive text. I also reply from my phone sometimes.

pwassink commented 5 years ago

esp08/18 ESP_Easy_mega-20190202-58-PR_2235_normal_ESP8266_4096 core 2.4.1 gpio12/14 esp01/02 ESP_Easy_mega-20190202-58-PR_2235_normal_core_250_beta_ESP8266_4096 gpio 0/2

-- frozen means woriking fine except for co2 readings

22:57 esp08 frozen again, counters Checksum (pass/fail): | 1077/2 put syslog aside when found. 05:05 esp18 frozen again, counters Checksum (pass/fail): | 1076/2 06:25 esp08 frozen again, no counters, esp was crashed completely this time, got syslog 12:57 esp18 frozen again. counters Checksum (pass/fail): | 116/2 20:43 esp18 frozen again, counters Checksum (pass/fail): | 316/2 tried the mhzreset command No changes what so ever, sensor is sending the mhz19 unknown response 0 0 0 0 0 0 0 0 message over and over again, tried several attempts no solution, power cycled the node .

pwassink commented 5 years ago

Gijs,

I did some analysis on the last syslog of esp08, during the hours that it is running i saw 78 occurances of: 2019-02-08T05:27:07.581181+01:00 hub08 EspEasy - - - EspEasy: MHZ19: Unknown response: ff 0 0 0 0 0 0 0 0

Automated search using #cat messages-crash-esp08-0625 |grep MHZ19 | grep response | grep 'ff 0 0 0 0 0 0 0 0' | wc -l

and after changing that cmdline to the second unknown-response variant linux counted 408 times this line : 2019-02-08T10:44:06.855432+01:00 hub08 EspEasy - - - EspEasy: MHZ19: Unknown response: 0 0 0 0 0 0 0 0 0

The checksum error counters in your custom-debug-version did not get any higher than 2 as far as i saw during the last couple of days. the first error message (MHZ19: Unknown response: ff 0 0 0 0 0 0 0 0) came a couple of times six or so directly after each other before it froze up this morning.