Closed atnon closed 2 months ago
Hi @atnon! We appreciate you submitting your first issue for our open-source project. 🌟
Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙
Hi @atnon ,
The easy way to fix this is you open a pull request with your changes. I recommend you take the time and look at Contributing to Zephyr in deep Contribution Guidelines.
Based on your analysis my recommendation is rework the Fixup
-#if defined(SERCOM_U2201) && (REV_SERCOM == 0x500)
-#define SERCOM_REV500
-#endif
in favor of
+#if defined(SERCOM_U102)
+#define SAM0_SERCOM_HAS_ERROR_FLAGS = 0
+#else
+#define SAM0_SERCOM_HAS_ERROR_FLAGS = 1
+#endif
and replace on code the #if defined(SERCOM_REV500)
. Besides that, any other detail that you discovered.
Once there is a PR it is possible to involve more people to help with tests.
I'll try to take a look at creating a PR for thiswhen time allows, hopefully sometime the coming week. Thanks for the input!
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
@atnon any update? Thanks.
Hi @atnon , We are closing next release and i was wondering if you will send a PR to fix your findings.
I am experiencing the same issues on atsamd21g18a zephyr v3.5.0 - cpu gets stuck in uart_irq_update
forever. Not sure why this is labeled as low priority
Hi @atnon ,
Any update?
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
Description Driver implementation assumes that only sam0 SOCs with
REV_SERCOM == 0x500
has error interrupt flags, whereas all supported sam0 SOCs except the samd21 actually supports them. This also results in interrupt flags not being cleared correctly in uart_sam0_irq_update.To Reproduce Should be enough running a MODBUS RTU Server on a physical SERCOM instance, on e.g. a samd21_xpro board or similar non-
REV_SERCOM == 0x500
towards an actual RS485 bus via transceiver. I might need to make some minimal example, as the modbus rtu server sample does not support the atsamd21_xpro board. Just a regular UART should work as well given what the failure mechanism seems to be.Expected behavior During normal operation, I should be able to issue an arbitrary amount of read writes to registers and coils defined on the MODBUS Server running in Zephyr.
Impact I found this while trying to get MODBUS RTU up and running, I saw that I would get a response from the MCU once, after which it would exhibit the following symptoms:
The firmware simply grinds to a complete halt.
I think the irq-callback loop between the modbus subsystem and the UART driver would just go in circles with the IRQ flags never being cleared, starving the other running threads. I did not see the INTFLAG register deassert any set flags after it had been set while debugging at least.
Doing a hackish edit to the driver uart_sam0.c like below solves the issue for now, but it's not exacly ready for a PR. Simply adding the same code as present witin the SERCOM_REV500 ifdef, but in the else clause:
Analysis The uart_sam0 driver has a version check for the SERCOM revision, but only for
REV_SERCOM == 0x500
: https://github.com/zephyrproject-rtos/zephyr/blob/f6eeb9dc84bec63dea231a49e97992875b659614/drivers/serial/uart_sam0.c#L24-L30REV_SERCOM == 0x500
, this is unfortunately not true and leads to a majority of sam0 SOCs to not handle clearing of interrupt flags correctly. Actually, the part I'm having issues with does noto have to do much with errors but rather clearing of sticky interrupts. Today, as of 2023-07-01, samd20 is the only sam0 SOC in the zephyrproject-rtos/hal_atmel repo not supporting the error flag. See table below, produced by yours truly looking in ASF headers and family datasheets:The uart_sam0_irq_update seems to be the only place where interrupt flags can be deasserted, and it would imply that for many of the listed SOCs, the flags would not unset after being set. https://github.com/zephyrproject-rtos/zephyr/blob/f6eeb9dc84bec63dea231a49e97992875b659614/drivers/serial/uart_sam0.c#L904-L928 The datasheet for samd21 does however state that:
So I guess there could be different behaviours depending on the order of enabling and disabling rx/tx interrupts in e.g. uart_sam0_irq_rx/tx_enable/disable. I'm honestly not sure.
Taking a look at history/blame, there's not any hard evidence to find around uart_sam0_irq_update:
34432 - uart_sam0_irq_update is introduced, with a special case for
REV_500
38311 - Minor change in assignment, guessing no effect for uart_sam0_irq_update
55387 - Adds caching of the TXC flag for
REV_500
Now, I've only checked for what bits need to be cleared explicitly, but not if there's other functional difference between the REV_SERCOM revisions. It might be that checking if
REV_SERCOM >= 0x200
, but I have no way of knowing whether there are effects elsewhere directly related to error processing mechanisms.Nonetheless, it would be nice to unbreak this for at least the samd21 and maybe others with the same
REV_SERCOM
.Does anyone know what would be a good way to proceed in unbreaking this? I'm not super familiar with the zephyr codebase, so I don't really have a holistic view with regards to what else might be affected by this.
Environment