renesas / fsp

Flexible Software Package (FSP) for Renesas RA MCU Family
https://renesas.github.io/fsp/
Other
182 stars 82 forks source link

SCI UART FIFO lacks of clearing errors #331

Closed MarcinKond closed 4 months ago

MarcinKond commented 8 months ago

There is no clearing service of errors in SSR_FIFO register in interrupt function (void sci_uart_eri_isr (void)) for UART working in FIFO mode. In case of UART error (frame, parity or overrun) bit errors are set in two registers: SSR and SSR_FIFO and in both should be cleared in this function. Without doing this, UART is lock. There is written the proper service UART FIFO algorithm in CPU documentation and it is different than prepared in fsp. All data should be read from the FIFO during interrupt or for error interrupt service FIFO may be reset.

renesas-brandon-hussey commented 5 months ago

This is being internally tracked using FSPRA-2351.

renesas-kyle-finch commented 4 months ago

Hi @MarcinKond,

I have been taking a look at this issue. I have been trying to create a scenario like you mentioned where the UART locks up with the current interrupt function but have not had success. Can you provide some more information about the kind of lock up you see? Is it specific to being either a frame, parity, or overrun error? Which MCU are you using?

Also, as far as the SSR and SSR_FIFO registers being cleared. These two registers actually share the same base address and same bit positions for the error flag bits. So clearing the error flags in SSR should do the same thing as clearing the flags using SSR_FIFO. Similarly, the reading of RDRHL in the interrupt function reads the same register address as the FIFO specific FRDRHL.

MarcinKond commented 4 months ago

My problem was connected with overrun error and processor R7A6M5BH. Receive FIFO Data Trigger Number was 1. We have many other interruption with higher priority. Reading FIFO is made in rxi_isr interruption, and error clearing is made in eri_isr. I thing, that in your solution there is some problem with reading FIFO and clearing error in two different interrupts when CPU is overloaded and with servicing other interruptions. My solution putting reading FIFO and clearing error in one interruption works properly without locks.

renesas-kyle-finch commented 4 months ago

Thanks for sharing. A few follow up questions:

And looking at Figure 30.27 in the R7A6M5BH User Manual, in the case of overrun, sci_uart_eri_isr() should read FRHRHL to make a space in the FIFO. It looks like that is being done here. Is that what you are referring to when you say read FIFO?

I am wondering if you are in a loop where the RXI is being starved due to the priority and that what you are seeing is that the overrun interrupt is being generated, a space in the FIFO is being made available in sci_uart_eri_isr(), but before RXI gets a chance to read again, overrun is generated again.

MarcinKond commented 4 months ago

I use priority RXI higher than ERI. I use only case for Trigger Number equal 1. RXI, TXI interrupt clears first in ICU and than peripheral interrupt. ERI interrupt clears first peripheral interrupt and than in ICU. I also changed the order clearing interrupts in ERI interrupt according to order in RXI, TXI interrupt and I think that it may be helpful.

renesas-kyle-finch commented 4 months ago
I also changed the order clearing interrupts in ERI interrupt

Could you please share the changes you have made when you say the above?

I am interpreting that as inside sci_uart_eri_isr() you are instead first calling R_BSP_IrqStatusClear() and then clearing the SSR error flags. Is that correct?

Going back to the overrun error, I am curious if you have DTC enabled?

MarcinKond commented 4 months ago

I have DTC enabled. I clear R_BSP_IrqStatusClear() and than SSR error flag in my newest version and it works properly.

renesas-kyle-finch commented 4 months ago

Thanks for the info. Looking into things a bit more, we believe that the current handling in both the RXI and ERI ISRs is correct.The reason being that the RXI is an edge interrupt and the ERI is a level interrupt.

With the edge interrupt this means clearing the IRQ first and then handling any peripheral flags.

Conversely, this means the level interrupt clears the peripheral flags first and then clears the IRQ. If this were reversed, clearing the IRQ first would yield an immediate re-interrupt if the condition is still present, like in an overrun scenario.

With that understanding, I'm still unsure of exactly what could be going wrong in your use case and haven't been able to reproduce it in the way described. It seems like having the multiple other high priority interrupts could be influencing something.

You mentioned a couple things you implemented as workarounds.

1) Reading the FIFO and clearing the error in one interruption 1) Changing the order of handling for the ERI to clearing the IRQ and then the peripheral flag

From your testing, do each of these help your error separately? Or are these two solution being combined?

MarcinKond commented 4 months ago

After reading your description I agree that my changes shouldn't solve the problem of stopping UART reception from the theoretical point of view. I didn't check separately all changes I did. My implementation includes reading FIFO, clearing error, changing order of clearing interrupts flags and put all of it in one interruption, in ERI interrupt. We have in our system many other interrupt with higher priorities and we check our system under heavy load. So during servicing UART interrupts, it is broken by others interrupts. Now I have no problems, but your original solution leads to stopping UART reception.

renesas-austin-hansen commented 4 months ago

@MarcinKond I'm not sure there's much more we can do to investigate at this time. This seems to be an issue of interrupt latency, where the RX ISR is not being serviced fast enough to avoid overrun. We are working on some optimizations across the board and will consider different ISR arrangements to potentially help alleviate issues like these in the future.

I am closing this ticket for now; please feel free to reopen if you run into this problem again.