plerup / espsoftwareserial

Implementation of the Arduino software serial for ESP8266
GNU Lesser General Public License v2.1
717 stars 270 forks source link

Reasons for errors in receive #215

Closed netmaniac closed 3 years ago

netmaniac commented 3 years ago

Hi! I'm using EspSoftwareSerial with Arduino Core 2.7.3 to read data from SDS011. It uses 9600 8N1. Recently we have seen a lot of receive errors. SDS011 protocol has built in checksum and it does not match in ~10% of received packets. It was tested with logic analyzer and packets sent to ESP8266 are correct. Data are not garbage - 90% of packets is correct, and those 10% differ by one byte.

Any hints where to search for source of problems? No interrupts used in project, disabling WiFi during communication session does not help, still receive errors. Errors are random, I can not find trigger causing that problem and since project has been completely rewritten it is hard to find code change which introduced problem.

dok-net commented 3 years ago

Dear @netmaniac, I am very interested in cooperating with you in this matter. That said, there's a caveat: I am going to ask you to use the latest versions of the software - but I assume for free support to a free and open-source project directly from the developer, that's a minor thing to ask.

The direct way to get a matching version of the bits is to make yourself familiar of how to use Arduino Core for ESP8266 latest master from the Github project. This will include the appropriate EspSoftwareSerial library release, so please make sure that you remove any EspSoftwareSerial from the Arduino library manager! Refer to the EspSoftwareSerial's README for instructions.

Next, and I guess you would have posted your issue there if you knew of its existence, I am the author of the esp_sds011 library. Unless there is something important withstanding using that in your project, I kindly ask you to use that.

Anyway, for a first test, there is a good chance that by updating to the latest ESP8266 core (the included EspSoftwareSerial always takes precedence over any externally installed one, or there are build conflicts etc., don't try to mix-and-match, just trust me and follow my instructs) your communications problems may go away or improve significantly in the least.

Please stay in touch about any results you have.

netmaniac commented 3 years ago

I have built project (it is built with Platformio) against Arduino Core for ESP8266 current master. No EspSoftwareSerial in lib dependencies, during build Platformio says it uses EspSoftwareSerial 6.12.6, ESP.getFullVersion() return:

SDK:2.2.2-dev(38a443e) Core:3.0.1-dev=30001000 lwIP:STABLE-2_1_2_RELEASE glue:1.2-48-g7421258 BearSSL:c0b69df

So it look like it is built against latest dev version. No changes - still random receive errors .

dok-net commented 3 years ago

OK, next step, can you narrow it down to what

Recently we have seen a lot of receive errors.

exactly means? I take it that there was an earlier setup that gave you much better results. Also, can you please head to my esp_sds011 project, there are examples that I would like you to run and see if you get similar error rates, you may have to change something to get a count of packets that fail the error check, though.

netmaniac commented 3 years ago

I'm in process of testing different code bases to find moment when problem has shown, but since code changes from last stable release known to be working w/o problem has passed over 6 months there is a lot releases to get through :)

Checksum error level was not monitored, wrong packets were just discarded. So, we noticed problem, when communication with SDS started failing completely. But again this is not case for each hardware instance. We have over 500 devices deployed half with old stable release and half with beta. Problem is present on 20% of them (betas). 2-4% can end in SDS completely not responding for commands, only power off/on cycle helps. On other 10-15% devices there are cases that SDS won't start in single measurement cycle (no response to wakeup command) and on next it starts and is communicating ok.

But, back to the topic. On current fw I have managed to get checksum error drop from 10-12% percent to 1.5-2.5%. I have changed way how ESP handles SDS packets when it is in warmup stage. Code was just ignoring all incoming packets with data. When warmup time it has ended, then there was SDS_serial.flush() and from that moment all bytes were processed. With reading and discarding all incoming bytes during warmup time error rate has dropped.

Old code has almost 0% error rate (I write almost since I do tests and wait for 300-500 packets). When I find release with first checksum errors, then I'll report what changes were introduced...

dok-net commented 3 years ago

but since code changes from last stable release

Which release of Arduino Core for ESP8266 is that, exactly?

netmaniac commented 3 years ago

It was Arduino Core 2.4.2 (last stable). On 2.6.3 we started to see problems, but only after changes in our SDS handling code, so it wasn't regression with change from 2.4.2 to 2.6.3.

Drop in errors (10-15% to 1.5-2.5%) I wrote in previous message is on current 3.0.1-dev

dok-net commented 3 years ago

That's a time span of 3 years. During which I got involved, found the sds011-based project you are contributing to didn't work once I updated to a then-current ESP8266 core, had an unpleasant experience with a condescending maintainer of that Stuttgart-based project, etc. I've since not been able to get my fork of that working again, and I am, honestly, not going to look into any details of that. I could only ask you personally to look at esp_sds011 for issue reporting. In my experience, "1.5 to 2.5%" error rate is quite good for software serial, particularly since there is concurrent activity running webserver, display, and other I2C sensors. Please consider a forward-looking approach, supporting anything but the latest release is out of focus for free support. There has been an important issue report in the wake of which I recently fixed a long-standing bug, that alone makes looking at releases before that worthless. Again, since, if I understand correctly, your update cycle of the ESP8266 core is multiple years, this should be acceptable.

Otherwise, at the very least, use ESP8266 core 2.7.4, though again, I am asking you to wait for 3.0.1 to be released, the compiler update alone, cutting recompile time to a third, is definitely worth it.

dok-net commented 3 years ago

When going from an old core release to the 3 release series, please clear flash completely during first download.

dok-net commented 3 years ago

@netmaniac I've just pushed release 0.11.3 of esp_sds011, which should make the use of the library much more comprehensible by looking at the measure examples source code comments. esp_sds011 is based on three sources, viz the manufacturer info, https://github.com/kadamski/arduino_sds011, and your project's wrapper a few years back.

netmaniac commented 3 years ago

@dok-net OK, I'm back with some summary. I have commit which changes behavior. Before - no checksum errors at all (tested on streams with 3k-4k SDS data replies). Next commit - error rate is 1-2%. However there were some (my :grin:) in code earlier. For example perform_work was lost.

So, question is - to operate correctly how often should be perform_work called? Assuming default buffer sizes and baudrate 9600. If this matters - two instances of software serial are in system.

Call to perform_work is now executed with schedule_recurrent_function_us but with long time loop execution yield has to be manually called to keep perform_work be called, right? So what are suggested maximum intervals between perform_work calls?

netmaniac commented 3 years ago

And regarding Your SDS library - I need non blocking solution, and while measurements processing is done using onReceive, then all calls to SDS011 will be waiting for SDS response. From my measurements - 200-300 ms at last, sometimes longer... A it to long for my goals...

dok-net commented 3 years ago

Building a FSM on Arduino from libraries is a rather difficult undertaking, or a mission impossible, no question. As for timing and buffer sizes, you have all the facts in your hands, like bitrate, frame sizes, delays of the SDS011, if esp_sds011 is blocking for to long, there's CoopTask. Even an RTOS is only a toolbox, it's up to you to build a system that keeps the real-time contracts and doesn't fail. Running two instances of EspSoftwareSerial is probing the limits, if you ask my opinion, unless the data traffic is well scheduled in regard to everything else. Asynchronous duplex likely will be troublesome. I am taking your queue about possible blocking in esp_sds011 and will be looking into that at some point, it sounds like an interesting and fun improvement. That of course doesn't help you now, unless you want to take your chances and submit some PR to the project.

Now, about

I have commit which changes behavior. Before - no checksum errors at all (tested on streams with 3k-4k SDS data replies). Next commit - error rate is 1-2%.

Good for you. Typical behavior in German discussion groups, to proclaim to know something, but not share it freely. I don't get that behavior, please don't do it here. Also, what about the latest GitHub master revisions? But without specifics, I am loosing all interest in reading anything about it.

dok-net commented 3 years ago

Please think in terms of "what can I contribute" instead of "what I need is". Or to quote J.F.K.:

And so, my fellow Americans: ask not what your country can do for you, ask what you can do for your country. My fellow citizens of the world: ask not what America will do for you, but what, together, we can do for the freedom of man.

netmaniac commented 3 years ago

Please think in terms of "what can I contribute" instead of "what I need is". Or to quote J.F.K.:

No problem, at all :) I wasn't sure if You are interested in :) since it is dig inside my project and there are many my bugs :) (like missing perform_work). Commit introducing errors in checksums: https://github.com/nettigo/namf/commit/11f3a79ec435361a65875409dcbfdcf60763f0d6

Old code process all incoming packets with each loop run, second one waits until there are 'enough' packets in buffer.

I have run this code compiled against latest Arduino dev core and behavior is the same. With code from 11f3a79ec435361a65875409dcbfdcf60763f0d6 we have errors in checksums previous version works w/o seeing such..

dok-net commented 3 years ago

OK, now I see, I thought you were speaking about revisions of EspSoftwareSerial, esp_sds011, the Arduino Core for ESP8266 Core, looking at commits before and after things significantly change your outcome would interest me. You are right, I am not so much interested in looking at opendata-sensors-software these days.

dok-net commented 3 years ago

From a perspective of effort, I am fine with you not using esp_sds011 if it doesn't suit your requirements off the shelf. As you correctly assumed, I am not going to help you with debugging beyond anything you can prove to me via an MCVE to be a problem with EspSoftwareSerial itself. Expecting software serial to work 100% flawlessly in concurrency with a lot of other libraries is unrealistic. I suggest you switch over (Serial.swap()) to HW serial and see what the results are with that.

That said, if you can, please provide an MCVE and exact revision data on EspSoftwareSerial and Arduino Core for ESP8266 where you can prove significantly different QOS. I have to tell you straightforward, that I am not interested in anything older than, at most, 2 years, like before EspSoftwareSerial became a submodule in Arduino Core for ESP8266.

Otherwise, we can go on forever discussing this, but I'd rather not :-) :-) :-)

netmaniac commented 3 years ago

Otherwise, we can go on forever discussing this, but I'd rather not :-) :-) :-)

Roger that :)

I plan to build NAMF firmware using Your library, if any interesting results will emerge I'll let You know.

Serial.swap() is not solution for now, we are bound to PCB design. We didn't know about that feature when we were designing PCB.

dok-net commented 3 years ago

Discussion has moved to https://github.com/dok-net/esp_sds011/pull/5