Closed ndreys closed 5 months ago
Not sure what the external contribution process for this repo is. CLA? Something else? Let me know.
The least I'm hoping to accomplish with this PR is to have folks at Renesas look at the fix and validate it (or tell me the reasoning is wrong)
Hi @ndreys, thank you for the MR. May I know whether you saw any transmission failure during your test?
@SeanMeng0509 yes, we have a Bosch IMU connected to that I2C bus and as I mentioned in my commit message:
However our typical "read register" transaction would consist of:
START + WRITE + REPEATED START + READ + STOP
in which case there's no STOP before the READ to clear ACKBT which causes us to NAK every byte read during that transfer. Per Bosch IMU datasheet:
"...After each data byte the master has to generate an acknowledge
bit (ACKS = 0) to enable further data transfer. A NACKM (ACKS = 1)
from the master stops the data being transferred from the slave. The
slave releases the bus so that the master can generate a STOP
condition and terminate the transmission..."
this causes it to release the bus effectively transferring all 1s.
Hi @ndreys, sorry for the confusion. I mean did you see any other error before this NACK issue?
No, not that I know of. Can you give me a bit more context on what you are trying to probe for?
Thanks. I just want to confirm whether it could be triggered by some exception cases.
Hi @ndreys, I agree with you there can be possible race condition in this case. We will merge your fix into FSP 5.2 release. Thank you so much for your help!
ACKBT is a bit whose value gets asserted on SDA during the 9th bit of a I2C read transfer as a NACK/ACK. This bit automatically gets cleared by the hardware on STOP condition:
The way that r_iic_master.c uses that (for I2C reads) is to wait for the reception of all but the last byte, set ACKBT to 1, then do the final bit reception with a NAK, then generate a STOP condition. That last step ensures that ACKBT goes back to 0 without having explicit code in the firmware.
Adding the following debug code:
and capturing DBG_PIN_1 on a logical analyzer shows that on a rare occasions our I2C transfers (both reads and writes) start with ACKBT asserted. This is harmless for writes since that bit is meaningless for I2C writes (I2C target is doing NAK/ACK-ing) and STOP condition at the end of the write, clears ACKBT back to 0 it is supposed to be.
However our typical "read register" transaction would consist of:
START + WRITE + REPEATED START + READ + STOP
in which case there's no STOP before the READ to clear ACKBT which causes us to NAK every byte read during that transfer. Per Bosch IMU datasheet:
"...After each data byte the master has to generate an acknowledge bit (ACKS = 0) to enable further data transfer. A NACKM (ACKS = 1) from the master stops the data being transferred from the slave. The slave releases the bus so that the master can generate a STOP condition and terminate the transmission..."
this causes it to release the bus effectively transferring all 1s.
Looking further at LA trace all instances of ACKBT being accidentaly set to 1 are perceeded by a READ transfer, which would suggest something about iic_master_rxi_read_data() might be causing the problem. Looking at that function epilogue, specifically at:
there seemed to be a race condition there. Since the last byte of the read is purposefully done with ICMR3.WAIT == 1, the I2C IP block will only start shifting data after ICDRR register is read. Once that happens we have a race between load-modify-store sequence implementing:
and the I2C IP logic responsible for clearing ACKBT on STOP condition. Should the STOP condition clear ACKBT after the load
but before the store
we'll end up writing old value of ACKBT which would be 1 and cloberring freshly cleard 0 there.
It seems reasonable to guess that this scenario is exactly the reason IP block designers implemented ACKWP -- a "write-unlock" bit guarding modification of ACKBT. Which, normally, would prevent the above scenario. Unfortunately, the code of the driver does not use that bit correctly. Instead of unlocking ACKBT for only a duration of its modification in the code, the driver just asserts the "write-unlock" bit and leaves ACKBT always writable. Fix this by adding code to clear "write-unlock" as soon as we are done touching ACKBT.
As with any race condition, it's hard to gather a very defninitive evindence to confirm the hypothesis above or that the fix is working, but I'm not able to reproduce the symptoms after a 12+ hour run.
NOTE: RA4E1 has two IP blocks capable of I2C, this commit pertains to IIC, not to be confused with SCI.