zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.06k stars 6.18k forks source link

modbus: crc mismatch due to missing rx bytes at high baud rates (460800) #71767

Open lanwer50 opened 2 months ago

lanwer50 commented 2 months ago

Notes I simplified the modbus RTU samples (server and client) to a few lines of code (see below) and I am running two nRF52840 development boards: one as server and one as client with serial rx and tx wires connected crosswise.

The difference to the default samples is mainly increasing the baud rate and deleting the irrelevant coil read/write code.

Describe the bug The problem I am experiencing is that some received bytes are missing, which leads to crc mismatch ( modbus_serial: Calculated CRC does not match received CRC). This however happens only when using a relatively high baud rate: 460800. With the default baud rate from the sample (19200) or even with 115200 I was not able to see this problem.

The problem occurs almost all the time, especially with longer packets, but does also appear sometimes with smaller ones. In the edited samples attached, I am experiencing with 64 number of registers (128 bytes to be read). I am guessing it is a timing issue that interrupts are not being serviced as quickly as needed with this baud rate.

This is not a hardware issue: I attached the serial lines to a logic decoder and I was able to see all the rx bytes as expected within the expected time.

To Reproduce Steps to reproduce the behavior:

  1. build the zephyr/samples/subsys/modbus/rtu_server after replacing its main.c with the attached snippet and flash it to one development board
  2. build the zephyr/samples/subsys/modbus/rtu_client after replacing its main.c with the attached snippet and flash it to the other development board
  3. connect tx and rx lines of the two boards crosswise and connect their GNDs together with a third jumper wire,
  4. See error

Expected behavior Receive complete data (without bytes missing) and so without crc mismatch.

Impact Showstopper for anyone who wants to use a high baud rate.

Environment (please complete the following information):

Additional context pin setting is defined in the overlay file attched.

missing_bytes_minimal_examples.zip

aescolar commented 2 months ago

CC @nordic-krch

lanwer50 commented 2 months ago

I will be commenting on this issue soon. I am currently in the process of understanding how the driver works underneath and I might get near the root cause.

lanwer50 commented 2 months ago

I am getting near cause, but still not completely solved.

I came across the documentation of the function uart_fifo_read() which states: "That means that once uart_irq_rx_ready() is detected, uart_fifo_read() must be called until it reads all available data in the FIFO (i.e. until it returns less data than was requested)."

This is however not the case in the implemented uart callback cb_handler_rx, where uart_fifo_read is called only once. A part of the solution would be to call this function as stated in the documentation until its returns <= 0. --> this leads to the following:

while(1) {
    n = uart_fifo_read(cfg->dev, cfg->uart_buf_ptr,
                       (CONFIG_MODBUS_BUFFER_SIZE -
                       cfg->uart_buf_ctr));

    cfg->uart_buf_ptr += n;
    cfg->uart_buf_ctr += n;

    if (n <= 0) {
        break;
    }
}

I am still looking for further fixes/optimizations to tackle this problem and looking forward to any input. Once completely solved, I will write my first merge request with this bugfix.

lanwer50 commented 2 months ago

A further improvement leading to the problem happening less frequently is to change the order of the functions called in the uart callback cb_handler_rx in RTU mode:

first call the uart_fifo_read() function as mentioned in my last comment (in a while loop, not once). Then, start the rtu_timer and not the other way around. This is proven to cause less uart buffer overruns. This also does make sense: this way, we are able to read the data as fast as possible and then do the less critical operation (starting the timer).

However, as I mentioned, the problem still occurs, in a less frequency.

Any other thoughts/inputs are much appreciated.

tomi-font commented 1 month ago

Not sure it's relevant, but do you use HW RX byte counting? If not, can you try? This DevZone answer here tells about the Kconfig options to enable.

lanwer50 commented 1 month ago

Not sure it's relevant, but do you use HW RX byte counting? If not, can you try? This DevZone answer here tells about the Kconfig options to enable.

-> This is indeed part of the solution puzzle.

The final solution for the problem above is using the uarte ASYNC Api along with the hardware timer for counting bytes.

Notes: