letscontrolit / ESPEasy

Easy MultiSensor device based on ESP8266/ESP32
http://www.espeasy.com
Other
3.26k stars 2.21k forks source link

mega-20190116 causes missing mhz19 co2values #2254

Closed pwassink closed 4 years ago

pwassink commented 5 years ago

version mega-20190116

After updating to mega-20190116 i have several different esp/types of nodes who stop sending the C02 values from the MHZ19 co2 sensor, wifi connection is still there, other sensors on same esp still work fine. data destination is Domoticz, data does not arrive there so it must be a local (esp) problem. I see the same symptoms on 4 different nodes here.

Logfile content shows this:
MHZ19: Error, timeout while trying to read MHZ19: Unknown response: 0 0 0 0 0 0 0 0 0

Anyone with same behaviour ?

TD-er commented 5 years ago

It looks like the sensor itself is crashing, but I can't think of a reason why it is doing it now. In our source there is a command for calling reset of the MH-Z19 sensor, but that's not documented in the datasheet. So I don't know its status. Could you try to call that command on a node which is stuck like this?

Edit: that command is mhzreset

pwassink commented 5 years ago

There has to be some change made after the mega 20181231 2.4.1. 4Mb version which has this results on the MHZ19 as far as i found yet. that one is running for weeks without problem even on gpio 12/14.

will try it next time something stops working, at 23:20 found that esp18 was frozen, mhzreset did not change that, see above log.

the mhzreset command does not open a command window in the webconsole, does not report Ok back or so, after several attempts i found in the syslog :

<5>1 2019-02-08T23:25:34.164674+01:00 hub18 EspEasy - - - EspEasy: MHZ19: Sent sensor reset! <5>1 2019-02-08T23:25:36.313828+01:00 hub18 EspEasy - - - EspEasy: MHZ19: Sent sensor reset! so easy says it has been sent, but it seems not in a flavour the mhz19 understands or likes :-)
TD-er commented 5 years ago

That command does seem to do something for the MH-Z19 A version.

127136: MHZ19: Sent sensor reset!
137272: MHZ19: Unknown response: ff ff ff ff ff ff ff ff ff
152275: MHZ19: Unknown response: ff 31 f5 4b 42 32 30 30 d
167277: MHZ19: Bootup detected! PPM value: 5000 Temp/S/U values: 23/1/15000.00
197279: MHZ19: Bootup detected! PPM value: 400 Temp/S/U values: 23/1/15000.00
<repeat a few times>
257279: MHZ19: PPM value: 906 Temp/S/U values: 24/64/11554.00

So it looks like it cannot be used on the B-version.

pwassink commented 5 years ago

I could assume something like that command is also used someway in the startup / init of the plugin ? which might explain why a reboot / reset without power cycle does not solve the problem.

Did not see failures until this afternoon: 15:43 esp08 frozen again, counters Checksum (pass/fail): | 2316/2 put the systlog aside.

TD-er commented 5 years ago

And just to be sure, these nodes were running fine on older firmware versions?

pwassink commented 5 years ago

Yes up till the ESP_Easy_mega-20181231_normal_ESP8266_4096 is was and is still ok, They would run for weeks without problems

TD-er commented 5 years ago

I looked into the recent changes in the SWserial library, since that's the one we're now using. One of the changes made in there was to no longer use interrupts on TX transfers for 9600 baud. Can you try this test build ?

N.B. I also removed the core 2.5.0 builds, since they act strange when serving web pages.

pwassink commented 5 years ago

Esp08 and Esp18 running ESP_Easy_mega-20190202-82-PR_2235_normal_ESP8266_4096.bin those two also do send their data in to a syslog server so the log data will be recorded

The other two will follow tomorrow, one of them might be a Mhz19A, And it is.

Hw config is now: Esp01 has the MHZ19A at gpio0/gpio2 Esp02 has the MHZ19B at gpio0/gpio2 Esp08 has the MHZ19B at gpio12/gpio14 Esp18 has the MHZ19B at gpio12/gpio14

All of them on the same testversion of esp-easy now, running on : ESP_Easy_mega-20190202-82-PR_2235_normal_ESP8266_4096.bin

update

06:01 esp08 frozen again, counters lost(user caused ) put the syslog aside.

pwassink commented 5 years ago

05:21 Esp18 frozen, Checksum (pass/fail): | 1460/0, have put syslog aside 12:53 Esp08 frozen , Checksum (pass/fail): | 1351/28, have put syslog aside

TD-er commented 5 years ago

The core 2.4.1 build was not included in that test build, so I just made one only containing that core version. core 2.4.1 build of same code as in ESP_Easy_mega-20190202-82-PR_2235_normal_ESP8266_4096.bin That one uses an older version of the SoftwareSerial library. If that's working, then I will change the used SW serial library.

pwassink commented 5 years ago

Starting upgrade to the special-version: firmware.bin for all 4 nodes now, finished at 12:30

First thing i noticed, Esp08 was frozen, this firmware update restarted the MHZ19B , it came back allive, an the init i mean: sensor is giving me a reasonable C02 Value this seemed much faster too

TD-er commented 5 years ago

I am running the old SWserial library on a test node too and indeed it is working much better For example the Eastron energy monitoring was having about 20 - 30% of the received lines corrupted, but this is now running perfectly. (no failed checksums yet)

TD-er commented 5 years ago

I just made a new test build in which I changed a lot regarding the HW/SW serial wrapper. It now works with the Eastron plugin for both SWserial and HW serial and SWserial is now using the old library we used until 20180131.

pwassink commented 5 years ago

So this is actually the same as the 0202 version but with the same serial lib as the 20181231 version, i will upgrade them directly .. just a moment

Installed ESP_Easy_mega-20190212-73-PR_2235_normal_ESP8266_4096.bin on Esp0/02/08/18 now   lets see ..

TD-er commented 5 years ago

Yep and it has the HWserial/SWserial wrapper, so it should be easy to switch to HWserial as soon as you're using the correct pins for those.

I still need to add GPIO2 as extra option for HWserial0 and there is still some cleaning-up to do in the code.

pwassink commented 5 years ago

As the first hours passed by they are still working, after the last update i cant see any unknown response ff 7 x 0 or 8 times 0 messages in the syslog from esp08 or esp18 anymore.

Took some time to look into the syslog-data of 2 nodes: Esp08 still has a unknown response with varying code's once in a while, Esp18 did not produce any faults yet

TD-er commented 5 years ago

Good to hear. I think we had some issues where the data sent to the sensors got corrupted. Then the sensor itself may crash, I guess.

With the Eastron plugin (sending a lot more data) the number of checksum errors was significantly reduced. From about 20 - 30% CRC errors in the messages to near 0. (1 error in 10'000 messages) using SWserial.

pwassink commented 5 years ago

Still running fine, all four of them

Looked at the fixed-list of release 20190215, is that version now the same as the version i am running on the 4 Co2 measuring nodes here ?

TD-er commented 5 years ago

Almost the same. At least the code for serial port and your plugin is the same.

pwassink commented 5 years ago

Then i leave it as it is and continue testing with the "special"

Was not clear to me what was included in the release and what not ,some of the numbers did match.

TD-er commented 5 years ago

Yep the big PR I merged yesterday was the source of all the test builds I made last weeks.

pwassink commented 5 years ago

Esp08 frozen at 14:17, counters Checksum (pass/fail): | 3292/70, syslog put aside for further analysis

TD-er commented 5 years ago

And did a simple reboot (or save settings) of the node restart the sensor (not power down) ?

pwassink commented 5 years ago

Will try that next time, just powercycled it after checking the counters.

Esp08 frozen again, counters Checksum (pass/fail): | 2660/55

save of the co2-device parameters did resolve the freeze !

pwassink commented 5 years ago

Esp08 18:18 frozen again, counters Checksum (pass/fail): | 2660/55

save of the co2-device parameters did resolve the freeze !

TD-er commented 5 years ago

OK, so it is possible to perform a reset. I will add a check for N unknown responses and then perform an init again.

pwassink commented 5 years ago

That might solve this problem completely.

For now the whole serial issue we had occurring on all of them seems gone now on the modelA and with two of the three Mhz19B model sensors. Esp08, which is a model B, is the only one i've seen with a variable "unknown response (every time a changing) Hex value " still in the logfiles, if this sensor could be reset out of the plugin when behaving bad, it might be the solution.

pwassink commented 5 years ago

Upgrading all to Mega 201902026 4Mb now.

First thing i noticed, Co2 sensor type mhz19B is giving correct values almost instantly after flash new image, much much faster then before too .

ristomatti commented 5 years ago

This sounds really promising! I've been watching this thread, waiting for a moment I could finally upgrage. :grin: I have a node which has both a MH-Z19 and PMS7003 attached. MH-Z19 has been working >90% of the time but occasionally I've had to reset the ESP for that.

The PMS7003 seems to be failing a lot more though. Most of the times even reset does not help but disconnecting power for a few seconds might fix it. I've been suspecting its connector might be the culprit but haven't had gotten into debugging it. This thread got me curious if it could still be firmware related... I'm going to give it a shot when I can be sure enough it won't cause issues with the MH-Z19 as it's data is more important.

And yes, I noticed #2349 only touched MH-Z19 code but the build I'm running is "20100 - Mega (core 2_4_0)" which I assume is quite old.

I want to say it's rare to see such persistence from both both sides of an issue. I reckon many would have given up after a few days. Hats off to you from this bystander @TD-er and @pwassink!

TD-er commented 5 years ago

You may want to wait 1 more day before upgrading. I am now working on some patches for network connectivity.

ristomatti commented 5 years ago

Thanks for the info! I was planning to wait @pwassink reports for a moment anyway though (lazy me).

TD-er commented 5 years ago

Just know that a lot has changed since your current build, and sadly not all positive. One of the issues we're still trying to tackle is the hardware watchdog reboots. So you may want to keep a backup of the current settings you have and write down the current version you're using. Just to be sure. I hope to improve a bit on these reboots by the fixes of today, but since these reboots have several different causes, I don't think today's fixes will handle them all.

ristomatti commented 5 years ago

Note taken. I can only imagine the difficulty to avoid regression in a system like ESPEasy. I'll be reporting any regressions after I'll do the upgrade (as separate issues of course).

I'm planning to upgrade two ESP's I have here. Based on those only I can observe if there's regression on MH-Z19, PMSx003, BMx280, TSL2561, DHT22 sensor or OLED SSD1306 display handling. TSL2561 on the other ESP tends to stop returning data randomly also (quite rare though). It is running build "20102 - Mega".

TD-er commented 5 years ago

That's not the build, but the internal file format (confusing I know) You may find the build name/date in the sysinfo page.

ristomatti commented 5 years ago

It's on the "build" field at least. Binary filename is "ThisIsTheDummyPlaceHolderForTheBinaryFilename64ByteLongFilenames". I probably flashed it straight from PlatformIO. I might have some difficulty flashing the same version then as it was almost a year ago. I didn't really make any notes not to mention a tag. :joy: Well I guess I can just do a backup of the firmware if I don't feel adventurous enough.

TD-er commented 5 years ago

No build timestamp in the sysinfo page?

ristomatti commented 5 years ago

There's a timestamp but it won't help much as I don't remember if I had pulled in the latest code before doing that. But of course it gives some ballpark.

image

This is not the ESP that hosts the MH-Z19 and PMS7003 though.

But I guess this is going way off topic. Sorry for temporarily hijacking the thread and thanks for replying to my comments anyhow!

emko commented 4 years ago

what version was this fixed in?

after a few hours i get MHZ19: Unknown response: 0 0 0 0 0 0 0 0 0 reboot won't fix it have to unplug it and plug it back in

i am using a wemos 1d mini pro, does this version have the fix in it?

Build:⋄ 20104 - Mega
System Libraries:⋄ ESP82xx Core 2_5_2, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.1.2 PUYA support
Git Build:⋄ mega-20191003
Plugins:⋄ 46 [Normal]
Build Md5: 3180a4d3e118166b3414444513a6169
Md5 check: passed.
Build Time:⋄ Oct 3 2019 02:15:29
Binary Filename:⋄ ESP_Easy_mega-20191003_normal_ESP8266_4M1M.bin

thanks

TD-er commented 4 years ago

Yep and previous versions also. I would suggest trying with the 20190928 build, since the October one had some other issues (which I'm fixing for the last week or so)

You may also want to have a look at the number of read errors shown on the plugin page, after it has been running for a few hours.

For example one of my own units (3 days uptime) image

N.B. the filter (set to "Use Unstable" in the screenshot) does not have a meaning on the MH-Z19B sensor, it is only applicable for the -A version.

And another one: image

As you can see, the number of failed reads is minimal. If you have some errors there, you may have another problem at hand.

emko commented 4 years ago

56604bb1d0dfd0bbf824b6966ca8aa30

those 11 resets where me trying to get it to boot up again without plugin it back in and out, i don't remeber seeing any errors though but i can give it another shot, should i be using Software Serial?

TD-er commented 4 years ago

Hardware Serial is the better one, but then you must also disable the serial port in the Tools->Advanced->Serial Port. (as is stated in your screenshot :) )

I'm not sure if any present USB to serial adapter on the board may have an effect on the communication. When in doubt you could change to software serial.

hcremer commented 3 years ago

Build: ESP_Easy_mega_20201130_normal_ESP8266_4M1M reporting to FHEM. I had a similar problem: The MH-Z190B sensor freezes every few hours and keeps dropping to 400 when I use hardware serial. After switching to software serial it seems to work normally and doesn't freeze anymore. 2 screenshots attached. Hardware_Serial Software_Serial

The BME280 attached with I2C works fine and keeps reporting all the time

TD-er commented 3 years ago

Hmm that's strange. To what HW serial port was it connected? What else is connected to the board? (e.g. USB to serial chip) Is "Use Serial" unchecked on the Tools->Advanced page?

hcremer commented 3 years ago

It's connected to GPIO-13 (D7) <- TX and GPIO-15 (D8) -> RX (first in HW now in SW) as I wanted to use the TXD0 and RXD0 to read the messages via the USB-port (CH340) of the Wemos D1 mini. Does "Use Serial" mean "Serial Settings - Enable Serial port:" checked? I left this checked to be able to read the messages by USB. MH-Z19B

TD-er commented 3 years ago

HW serial on an ESP8266 uses Serial0. If you send also logs to the same serial port, the sensor may crash as it doesn't understand the "commands" it receives when the logs are sent via that port.

If you connect something to the HW Serial port, you should no longer send other data. Thus you should disable "Use Serial" on the tools->Advanced page.

hcremer commented 3 years ago

Yesterday, I got a second Sensor MH-Z19C. This one seems to work fine with HW-Serial on Serial2. As the sensors all were connected to Pins D7 (GPIO 13) and D8 (GPIO 13) (RXD2 and TXD2) there should not be any conflict with Serial0 (Pins RXD0 (GPIO3) and TXD0 (GPIO1)) as far as I'm understanding the Pinout. I think the first sensor (MH-Z19B from another supplier) was just a fake not working properly at all, ... So be careful when buying those sensors. The second one, I got yesterday was in a much better packaging with the Winsen Logo and a test certificate joined. The supplier seems to be more serious than the one that sold me the first one.

DerGuteWolf commented 3 years ago

@hcremer Was the first one a non-working MH-Z19C oder a non-working MH-Z19B? I received a MH-Z19C today and while the measurement lamp blinks, I get only repeated "MHZ19: Unknown response: 0 0 0 0 0 0 0 0 0" in espeasy (using HW-Serial). Setup used to work with a MH-Z19B. "Use Serial" on the tools->Advanced page was disabled.

TD-er commented 3 years ago

which pins do you use? Are you sure TX and RX are correct? (can be swapped, maybe) N.B. some pins cannot be used for this, like GPIO-16 (does not support interrupts) and some use pull-down resistor like GPIO-15.

DerGuteWolf commented 3 years ago

yes, I know. As I said, I had this working before with an MH-Z19B (exact same settings but also tried TX/RX swap). However I got the MH-Z19C now to respond with software serial. Maybe some timing differences between MH-Z19B and MH-Z19C...

TD-er commented 3 years ago

Hmm I have never seen a "C" version. So no idea if they send different replies.

I will try to dig up a datasheet to see if there are changes.