plerup / espsoftwareserial

Implementation of the Arduino software serial for ESP8266
GNU Lesser General Public License v2.1
711 stars 270 forks source link

SoftwareSerial is unstable in 2.4.0 #63

Closed Anton-V-K closed 5 years ago

Anton-V-K commented 6 years ago

I've installed ESP8266 core 2.4.0 for Arduino 1.8.5 (Windows 7 SP1 64-bit) and immediately encountered problems when comminucating with one of my sensors (MH-Z19) which is utilized through SoftwareSerial library: readings return 0 (zeros) in some uncertain conditions (probably when WiFi-connection is active). I had to downgrade ESP8266 core to 2.3.0 to restore stable work with the sensor.

MCU: LoLin (NodeMCU, esp8266, ESP12-E)

The reading code is more-less standard (I haven't changed it for months) - here are few fragments:

#  include <SoftwareSerial.h>

SoftwareSerial co2Serial(MH_Z19_TX, MH_Z19_RX); // define MH-Z19

void CO2_begin()
{
  LOG_println(Fx("CO2-sensor: begin(9600)"));
  co2Serial.begin(9600); //Init sensor MH-Z19(14)
}

int readCO2()
{
  // command to ask for data (0x86)
  const byte cmd[9] = {0xFF, 0x01, 0x86, 0x00, 0x00, 0x00, 0x00, 0x00, 0x79};
  byte response[9] = { 0 }; // for answer

  co2Serial.write(cmd, sizeof(cmd)); //request PPM CO2
  co2Serial.readBytes(response, sizeof(response));
#ifdef ENABLE_DIAG
  char raw[64];
  sprintf(raw, "CO2 RAW: %02X %02X %02X %02X %02X %02X %02X %02X %02X"
         , response[0], response[1], response[2], response[3]
         , response[4], response[5], response[6], response[7], response[8]);
  LOG_println(raw);
#endif
  if (response[0] != 0xFF)
  {
    LOG_print  (F("Wrong starting byte from co2 sensor: 0x"));
    LOG_println(response[0], HEX);
    return -1;
  }

  if (response[1] != 0x86)
  {
    LOG_print  (F("Wrong command from co2 sensor: 0x"));
    LOG_println(response[1], HEX);
    return -1;
  }

  const int responseHigh = response[2];
  const int responseLow  = response[3];
  const int ppm = (256 * responseHigh) + responseLow;
  return ppm;
}

With SoftwareSerial from 2.4.0 the sensor in some situations just returns all zeros:

348086: loop: millis() = 348086, 2018-02-22 21:44:40, Thursday
348141: reading data:
349150: CO2 RAW: 00 00 00 00 00 00 00 00 00
349150: Wrong starting byte from co2 sensor: 0x0
349151:   CO2 = -1
349151: CO2 not valid
349479: DHT #1 Humidity = 40.7
349533: DHT #1 Temperature = 18.8
349586:   Free RAM: 19952
349586:   Online Queue: 0
349587: Data reading error #3/15
349587: loop finished, 1501 millis spent

I guess, the problem may be caused by other parts of ESP8266 core, but it reveals itself through SoftwareSerial class. And it may be related to WiFi-part, since there were similar issues like https://github.com/esp8266/Arduino/issues/2937, https://github.com/esp8266/Arduino/issues/4315

plampix commented 6 years ago

I'm running into the same problem driving a nextion display using softwareserial. In 2.4.0-rc2 it worked, in the release it fails.

torntrousers commented 6 years ago

I see similar with SoftwareSerial and a GPS module at 9600 baud. Really unstable, missing lots of bytes and regular crashes. Same setup and sketch is rock solid going back to 2.3.0. Exception decoder of crash is:

Exception (0): epc1=0x40202c24 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

ctx: sys sp: 3ffffc00 end: 3fffffb0 offset: 01a0

stack>>> 3ffffda0: 3fffc278 40101f8c 3fffc200 00000022
3ffffdb0: 31af3954 4010696c 3fffc258 4000050c
3ffffdc0: 40000f83 00000030 00000012 00000020
3ffffdd0: ffffffff 00000005 00000000 4010686d
3ffffde0: 401069a6 000000fc 00000000 40106a00
3ffffdf0: c0037015 3ffe8ff4 3ffeea78 401067a5
3ffffe00: 31adf3f8 00000005 00000000 00000022
3ffffe10: 3fffc200 4010696c 3fffc258 4000050c
3ffffe20: 4000437d 00000030 00000012 ffffffff
3ffffe30: 60000200 00000008 34033402 80000000
3ffffe40: 20000000 3fff15c0 80000000 203fc0c0
3ffffe50: 80000000 3ffe8ff4 3ffee478 3fff15c4
3ffffe60: 000001b4 003fc0c0 60000600 00000030
3ffffe70: 40221e5c 00000030 00000012 ffffffff
3ffffe80: 40222edd 00000000 ffffffff fffedf88
3ffffe90: 3ffee478 40222ed4 3ffee478 00000c37
3ffffea0: 3ffee478 3ffe8ff4 3ffee478 3ffed2f4
3ffffeb0: 3ffee4a0 009eed2a 60000600 00000030
3ffffec0: 4020fa58 00000000 3fff01b8 000001f4
3ffffed0: 4020a058 3ffe8ef8 3ffe8ef8 401004d8
3ffffee0: 3ffeb3f0 000000dd 3ffeb40a 4020fd8c
3ffffef0: 3ffeb410 3ffeb3f0 3fff01b8 40209414
3fffff00: 3ffeecd0 00000007 3ffe8ef8 40209450
3fffff10: 00000000 400042db 3ffe8ff4 00000000
3fffff20: 40004b31 3fff1584 000001f4 003fc080
3fffff30: 40105acc 000001f4 3ffed480 401004f4
3fffff40: 40106ddd 3fff1584 00000000 009eed2a
3fffff50: 40104e5a 3ffed2f4 3ffed480 40223ec6
3fffff60: 40222edd 40222ee6 3ffed2f4 3ffee4a0
3fffff70: 40228215 3ffee4a0 3ffee478 40228215
3fffff80: 4022825a 3fffdab0 00000000 3fffdcb0
3fffff90: 3ffee4c8 3fffdab0 00000000 40202b4f
3fffffa0: 40000f49 40000f49 3fffdab0 40000f49
<<<stack<<<

ets Jan 8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 1384, room 16 tail 8 chksum 0x2d csum 0x2d v4ceabea9 ~ld

Paste your stack trace here

Exception 0: Illegal instruction Decoding 38 results 0x40202c24: optimistic_yield at C:\Arduino\ESP8266\2.4.0-1.8.3\arduino-1.8.3\portable\packages\esp8266\hardware\esp8266\2.4.0\cores\esp8266/core_esp8266_main.cpp line 57 0x40101f8c: wDev_ProcessFiq at ?? line ? 0x4010696c: interrupt_handler at C:\Arduino\ESP8266\2.4.0-1.8.3\arduino-1.8.3\portable\packages\esp8266\hardware\esp8266\2.4.0\cores\esp8266/core_esp8266_wiring_digital.c line 122 0x4010686d: sws_isr_5() at C:\Arduino\ESP8266\2.4.0-1.8.3\arduino-1.8.3\portable\packages\esp8266\hardware\esp8266\2.4.0\libraries\SoftwareSerial/SoftwareSerial.cpp line 131 0x401069a6: interrupt_handler at C:\Arduino\ESP8266\2.4.0-1.8.3\arduino-1.8.3\portable\packages\esp8266\hardware\esp8266\2.4.0\cores\esp8266/core_esp8266_wiring_digital.c line 128 0x40106a00: interrupt_handler at C:\Arduino\ESP8266\2.4.0-1.8.3\arduino-1.8.3\portable\packages\esp8266\hardware\esp8266\2.4.0\cores\esp8266/core_esp8266_wiring_digital.c line 147 0x401067a5: EspClass::getCycleCount() at C:\Arduino\ESP8266\2.4.0-1.8.3\arduino-1.8.3\portable\packages\esp8266\hardware\esp8266\2.4.0\libraries\SoftwareSerial/SoftwareSerial.cpp line 131 : (inlined by) SoftwareSerial::rxRead() at C:\Arduino\ESP8266\2.4.0-1.8.3\arduino-1.8.3\portable\packages\esp8266\hardware\esp8266\2.4.0\libraries\SoftwareSerial/SoftwareSerial.cpp line 216 0x4010696c: interrupt_handler at C:\Arduino\ESP8266\2.4.0-1.8.3\arduino-1.8.3\portable\packages\esp8266\hardware\esp8266\2.4.0\cores\esp8266/core_esp8266_wiring_digital.c line 122 0x40221e5c: rc_only_sta_trc at ?? line ? 0x40222edd: esf_buf_setup at ?? line ? 0x40222ed4: esf_buf_setup at ?? line ? 0x4020fa58: sntp_retry at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/apps/sntp/sntp.c line 612 0x4020a058: cyclic_timer at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 165 0x401004d8: malloc at C:\Arduino\ESP8266\2.4.0-1.8.3\arduino-1.8.3\portable\packages\esp8266\hardware\esp8266\2.4.0\cores\esp8266\umm_malloc/umm_malloc.c line 1668 0x4020fd8c: mem_malloc at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c line 136 0x40209414: do_memp_malloc_pool at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/memp.c line 231 0x40209450: memp_malloc at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/memp.c line 231 0x40105acc: spi_flash_read at ?? line ? 0x401004f4: calloc at C:\Arduino\ESP8266\2.4.0-1.8.3\arduino-1.8.3\portable\packages\esp8266\hardware\esp8266\2.4.0\cores\esp8266\umm_malloc/umm_malloc.c line 1688 0x40106ddd: __wrap_spi_flash_read at C:\Arduino\ESP8266\2.4.0-1.8.3\arduino-1.8.3\portable\packages\esp8266\hardware\esp8266\2.4.0\cores\esp8266/core_esp8266_phy.c line 267 0x40104e5a: ets_timer_setfn at ?? line ? 0x40223ec6: pm_open at ?? line ? 0x40222edd: esf_buf_setup at ?? line ? 0x40222ee6: esf_buf_setup at ?? line ? 0x40228215: ets_timer_handler_isr at ?? line ? 0x40228215: ets_timer_handler_isr at ?? line ? 0x4022825a: ets_timer_handler_isr at ?? line ? 0x40202b4f: loop_task at C:\Arduino\ESP8266\2.4.0-1.8.3\arduino-1.8.3\portable\packages\esp8266\hardware\esp8266\2.4.0\cores\esp8266/core_esp8266_main.cpp line 57

plerup commented 6 years ago

@Anton-V-K I would recommend using available() before readBytes

@torntrousers there are several report on crashes when using 9600. I will add some new functionality to try and avoid this.

torntrousers commented 6 years ago

Great. Let me know if I can do more to help debug or test.

plampix commented 6 years ago

It's a problem with the m_highSpeed logic to enable/disable interrupts. If I force m_highSpeed to true, it all works.

plerup commented 6 years ago

Yes, I will add a new method which can be used to control weather interrupt are enabled or not during Tx but there still seems to be quite worse stability in 2.4 than in 2.3 due to something else as well.

This constant fight for getting execution time which SoftwareSerial has with the rest of the system seems to be hard to really solve.

plerup commented 6 years ago

Please try latest commit

CurtRod commented 6 years ago

I had the same problem. Your commit, two days ago solved the Problem for me. I use SofwareSerial for communication with a Modbus Slave. Thanks a lot!

torntrousers commented 6 years ago

Yes this latest code is working well for me with no more crashes. Thanks for the quick fix.

Anton-V-K commented 6 years ago

I've update ESP8266 core to 2.4.1, and unfortunately the problem is still there. So I've adjusted my code to use timeout when reading data from the sensor (I've borrowed the approach from other project), and with timeout up to 200 ms the sensor sometimes (though too often) behaves as "unresponsive" (well, the sensor's firmware actually isn't afffected by the source code). There is no such delay when the code is built with ESP8266 core 2.3.0 (the device can run up to 10+ days without issues), so I can conclude the problem is caused by ESP8266 core 2.4.x.

jeroenst commented 6 years ago

I can confirm this fix works after placing the files from the master branch in my esp8266 library path. No more corrupted data! Thanx!

tmorford commented 6 years ago

I am having a similar issue running a GPD with a ESP8266 it will crash every 8th time it reads with this error: Exception (0): epc1=0x4020a3ac epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

ctx: sys sp: 3ffffc00 end: 3fffffb0 offset: 01a0

stack>>> 3ffffda0: 00000000 3fffc6fc 00000001 00000000
3ffffdb0: 972e56b6 ffffffff 00000020 00000030
3ffffdc0: 03000000 3ffed0a8 3ffea041 00000020
3ffffdd0: ffffffff 0000000c 00000000 40106861
3ffffde0: 40106986 000000ff 3ffeef38 401069e0
3ffffdf0: c0036035 3fff1db0 000004eb 00000020
3ffffe00: 4021c128 3fff09b4 40106a7e 00000022
3ffffe10: 3fffc200 4010694c 3fffc258 4000050c
3ffffe20: 4000437d 00000030 0000000d ffffffff
3ffffe30: 60000200 00000008 34033402 80000000
3ffffe40: 20000000 3fff2ed8 80000000 203fc140
3ffffe50: 80000000 3fffc6fc 00000001 3fff2edc
3ffffe60: 00000134 003fc140 60000600 00000030
3ffffe70: 4022dd53 000001f4 4023e450 00000361
3ffffe80: 00000001 4022cddc 00000001 bfffffff
3ffffe90: ffffffff 3fffc6fc 00000001 3ffed8e0
3ffffea0: 00000000 1cbe94af 60000600 00000030
3ffffeb0: 4020a2c3 3ffef384 3ffe9eb8 3fffd9d0
3ffffec0: 00000000 00000000 00000000 fffffffe
3ffffed0: ffffffff 3fffc6fc 00000001 3fffdab0
3ffffee0: 00000000 3fffdad0 3ffef384 00000030
3ffffef0: 00000000 3fffdad0 3ffef384 00000030
3fffff00: 00000000 00000000 3ffe9f7d 3ffea017
3fffff10: 00000000 400042db 00000000 3ffe9f93
3fffff20: 40004b31 3fff2e1c 000001f4 003fc080
3fffff30: 40105aac 000001f4 3ffed8e0 401004f4
3fffff40: 40106dbd 3fff2e1c 00000000 1cbe94af
3fffff50: 3ffe9f1c 00000129 3ffed8e0 4022dd9e
3fffff60: 4022cde5 4022cdee 3ffed754 3ffee900
3fffff70: 402314dd 3fffdcc0 3ffe9698 3ffe9698
3fffff80: 40231522 3fffdab0 00000000 3fffdcb0
3fffff90: 3ffee910 3fffdad0 3ffef384 4020a2d7
3fffffa0: 40000f49 40000f49 3fffdab0 40000f49
<<<stack<<<

ets Jan 8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 1384, room 16 tail 8 chksum 0x2d csum 0x2d v614f7c32 ~ld

Decoding 29 results 0x40106785: EspClass::getCycleCount() at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\libraries\SoftwareSerial/SoftwareSerial.cpp line 131 : (inlined by) SoftwareSerial::rxRead() at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\libraries\SoftwareSerial/SoftwareSerial.cpp line 216 0x40106861: sws_isr_12() at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\libraries\SoftwareSerial/SoftwareSerial.cpp line 131 0x40106986: interrupt_handler at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/core_esp8266_wiring_digital.c line 128 0x4010420e: lmacTxFrame at ?? line ? 0x401069e0: interrupt_handler at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/core_esp8266_wiring_digital.c line 147 0x4021c128: ieee80211_ht_updateparams at ?? line ? 0x4010694c: interrupt_handler at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/core_esp8266_wiring_digital.c line 122 0x401069e0: interrupt_handler at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/core_esp8266_wiring_digital.c line 147 0x4010694c: interrupt_handler at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/core_esp8266_wiring_digital.c line 122 0x4020a2d4: loop_task at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/core_esp8266_main.cpp line 57 0x401006e4: cont_run at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/cont.S line 74 0x4020a348: loop_wrapper at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/core_esp8266_main.cpp line 57 0x40105aac: spi_flash_read at ?? line ? 0x401004f4: calloc at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1687 0x40106dbd: __wrap_spi_flash_read at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/core_esp8266_phy.c line 267 0x4022dd9e: pm_open at ?? line ? 0x4022cde5: esf_buf_setup at ?? line ? 0x4022cdee: esf_buf_setup at ?? line ? 0x402314dd: ets_timer_handler_isr at ?? line ? 0x40231522: ets_timer_handler_isr at ?? line ? 0x4020a2d7: loop_task at C:\Users\Asus\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/core_esp8266_main.cpp line 57 I have tried running a 200 ms delay to slow things down and I am also running at 9600. The software serial is sending ... ",)⸮b⸮⸮b.⸮b⸮⸮⸮bb⸮⸮b⸮⸮b⸮⸮⸮b⸮⸮b⸮⸮b⸮⸮b⸮⸮⸮bb.⸮b⸮⸮b⸮⸮⸮bR⸮⸮j"

jeroenst commented 6 years ago

I switched over to use the hardware serial port. Software is now stable as a rock. Also 115200 baud didn't work, maybe because of uc speed which can't catch up timing. For debugging I use the software to debug over telnet which also works great.

strange-v commented 6 years ago

I'm experiencing absolutely the same issue as the topic starter has using ESP8266 core 2.4.1. Temporarily switched to 2.3.0 due to lack of solutions.

merlokk commented 6 years ago

confirm. 2.3.0. works OK, but. 2.4.1 sends strange sequences as bytes. for me 2.4.1 breaks first byte.

aimette commented 5 years ago

Same problem, 2.3.0 works but 2.4.2 unusable with SoftwareSerial. Is anyone working on it?

mikekgr commented 5 years ago

also using 2.4.2 and even the master form the git, the problem still exist. Please, somebody that can, to do something. We loose the software serial... ( or at least it is very unreliable ) .

strange-v commented 5 years ago

It seems that no one wants or have the ability to fix this issue. Is it time to get rid of esp8266 and switch to esp32 in all new projects?

mikekgr commented 5 years ago

@strange-v the big problem to that decision is that the ESP32 Arduino core is not mature at all... I read problems here and there...

merlokk commented 5 years ago

and) it just need to go to the linux boards.... )))))

jeroenst commented 5 years ago

I use the hardware tx and rx instead, no problems anymore

mikekgr commented 5 years ago

@jeroenst I think that is not related to the problem. Here we are talking for software serial, hardware serial is different and not the solution. Further more, how can you deal if you need two full UARTS ( TX & RX )from ESP8266?

strange-v commented 5 years ago

Has someone tried to reproduce in 2.5.0-beta2?

TRAHOMOTO commented 5 years ago

I faced with a similar problem on D1 mini (ESP-12E) with PZEM-004T and SoftSerial in 2.5.0-beta2. Right now I'm testing 2.5.0-beta3

TRAHOMOTO commented 5 years ago

Unfortunately 2.5.0-beta3 has the same issue :(

Anton-V-K commented 5 years ago

And why this issue is closed? According to the recent comments , the problem isn't fixed, so it's worth renaming the title to something like SoftwareSerial is unstable since 2.4.0, and keep the issue open until we have a resolution. At least there is a workaround (downgrading to 2.3.0), so it makes sense to keep this issue visible.

dok-net commented 5 years ago

Version 4.0.0 brings a major overhaul of the receive code. Your complaint is against the Arduino core for ESP8266 version 2.4.0, well over a year old, with a similarly old version of EspSoftwareSerial! Could you please test your case, and if any failure is observed, provide an updated, terse and comprehensible report in a newly opened issue here? I really don't see any other way.

ortegafernando commented 5 years ago

Hi, I don't also know why this issue is closed. 2.5.2 also not working.

twanek commented 5 years ago

hello!

i have the same problem. i try to receive serial data with esp8266 every second from a victron blue solar mppt controller.

as far as i use the software serial alone, with no wifi, and just display the values on the serial monitor, it works fine.

when i try to send the values with blynk or with http, the problems begin: i receive a lots of zeros (data loss), and random crashes. the problem is valid (tested) on core 2.4.2 and also 2.5.x

so, dok-net, please provide a working solution or workaround for this. why did you close this topic, if the problem is not solved?

thanks!

dok-net commented 5 years ago

@twanek There is almost nothing provided in this issue that would allow anybody to reproduce, fix, and test anything at all - that alone is enough to close and keep closed. That said, the sketch at the top of this issue report is fundamentally flawed in this line:

co2Serial.readBytes(response, sizeof(response));

Obviously the author missed the comment and signature of the used function:

// The readBytes functions are non-waiting, there is no timeout.
    size_t readBytes(uint8_t* buffer, size_t size) override;

There is no reason to believe otherwise than that there just wasn't any data received yet.

I would like to ask you to keep in mind that this is a free and open-source software project developed and maintained by volunteers in their spare time, without any support from the manufacturer of the ESP8266 etc. SOCs whatsoever.

In general, concurrent sending and receiving serial data, and then on top of that engaging the WiFi stack, is going to be afflicted by bit errors, depending on the load. At 9600bps, the error rates seem acceptable in my tests, but they are non-zero, I am sorry.

To get the best results, and also attract the most of my interest, I would like to ask you to do the following: