renesas / fsp

Flexible Software Package (FSP) for Renesas RA MCU Family
https://renesas.github.io/fsp/
Other
182 stars 82 forks source link

app: r_iic_master: Don't clobber ACKBT #340

Closed ndreys closed 5 months ago

ndreys commented 5 months ago

ACKBT is a bit whose value gets asserted on SDA during the 9th bit of a I2C read transfer as a NACK/ACK. This bit automatically gets cleared by the hardware on STOP condition:

"When stop condition request is detected with the SP bit in ICCR2
set to 1"

The way that r_iic_master.c uses that (for I2C reads) is to wait for the reception of all but the last byte, set ACKBT to 1, then do the final bit reception with a NAK, then generate a STOP condition. That last step ensures that ACKBT goes back to 0 without having explicit code in the firmware.

Adding the following debug code:

@@ -607,6 +607,11 @@ static fsp_err_t iic_master_read_write (i2c_master_ctrl_t * const p_api_ctrl,

     p_ctrl->read = (bool) direction;

+    if (p_ctrl->p_reg->ICMR3_b.ACKBT) {
+        gpio_set(DBG_PIN_1);
+        gpio_clear(DBG_PIN_1);
+    }
+
     /* Kickoff the read operation as a master */
     err = iic_master_run_hw_master(p_ctrl);

and capturing DBG_PIN_1 on a logical analyzer shows that on a rare occasions our I2C transfers (both reads and writes) start with ACKBT asserted. This is harmless for writes since that bit is meaningless for I2C writes (I2C target is doing NAK/ACK-ing) and STOP condition at the end of the write, clears ACKBT back to 0 it is supposed to be.

However our typical "read register" transaction would consist of:

START + WRITE + REPEATED START + READ + STOP

in which case there's no STOP before the READ to clear ACKBT which causes us to NAK every byte read during that transfer. Per Bosch IMU datasheet:

"...After each data byte the master has to generate an acknowledge bit (ACKS = 0) to enable further data transfer. A NACKM (ACKS = 1) from the master stops the data being transferred from the slave. The slave releases the bus so that the master can generate a STOP condition and terminate the transmission..."

this causes it to release the bus effectively transferring all 1s.

Looking further at LA trace all instances of ACKBT being accidentaly set to 1 are perceeded by a READ transfer, which would suggest something about iic_master_rxi_read_data() might be causing the problem. Looking at that function epilogue, specifically at:

p_ctrl->p_buff[p_ctrl->loaded] = p_ctrl->p_reg->ICDRR;

/* Update the counter values */
p_ctrl->loaded++;
p_ctrl->remain--;

/* If we are done with the reception, clear the WAIT bit */
if (0U == p_ctrl->remain)
{
    p_ctrl->p_reg->ICMR3_b.WAIT = 0;

    /* If this transaction does not have the restart flag set to true,
     * last byte has been read and WAIT has been cleared.
     * Callback will be issued by the ERI once the stop condition is detected * In case of restart flag set to true a callback will be issued by the ERI once the start * (from restart) condition is detected */ }

there seemed to be a race condition there. Since the last byte of the read is purposefully done with ICMR3.WAIT == 1, the I2C IP block will only start shifting data after ICDRR register is read. Once that happens we have a race between load-modify-store sequence implementing:

    p_ctrl->p_reg->ICMR3_b.WAIT = 0;
e3d6:   6902        ldr r2, [r0, #16]
e3d8:   7913        ldrb    r3, [r2, #4]
e3da:   f36f 1386   bfc r3, #6, #1
e3de:   7113        strb    r3, [r2, #4]

and the I2C IP logic responsible for clearing ACKBT on STOP condition. Should the STOP condition clear ACKBT after the load

e3d8:   7913        ldrb    r3, [r2, #4]

but before the store

e3de:   7113        strb    r3, [r2, #4]

we'll end up writing old value of ACKBT which would be 1 and cloberring freshly cleard 0 there.

It seems reasonable to guess that this scenario is exactly the reason IP block designers implemented ACKWP -- a "write-unlock" bit guarding modification of ACKBT. Which, normally, would prevent the above scenario. Unfortunately, the code of the driver does not use that bit correctly. Instead of unlocking ACKBT for only a duration of its modification in the code, the driver just asserts the "write-unlock" bit and leaves ACKBT always writable. Fix this by adding code to clear "write-unlock" as soon as we are done touching ACKBT.

As with any race condition, it's hard to gather a very defninitive evindence to confirm the hypothesis above or that the fix is working, but I'm not able to reproduce the symptoms after a 12+ hour run.

NOTE: RA4E1 has two IP blocks capable of I2C, this commit pertains to IIC, not to be confused with SCI.

ndreys commented 5 months ago

Not sure what the external contribution process for this repo is. CLA? Something else? Let me know.

The least I'm hoping to accomplish with this PR is to have folks at Renesas look at the fix and validate it (or tell me the reasoning is wrong)

SeanMeng0509 commented 5 months ago

Hi @ndreys, thank you for the MR. May I know whether you saw any transmission failure during your test?

ndreys commented 5 months ago

@SeanMeng0509 yes, we have a Bosch IMU connected to that I2C bus and as I mentioned in my commit message:


However our typical "read register" transaction would consist of:

START + WRITE + REPEATED START + READ + STOP

in which case there's no STOP before the READ to clear ACKBT which causes us to NAK every byte read during that transfer. Per Bosch IMU datasheet:

"...After each data byte the master has to generate an acknowledge
bit (ACKS = 0) to enable further data transfer. A NACKM (ACKS = 1)
from the master stops the data being transferred from the slave. The
slave releases the bus so that the master can generate a STOP
condition and terminate the transmission..."

this causes it to release the bus effectively transferring all 1s.
SeanMeng0509 commented 5 months ago

Hi @ndreys, sorry for the confusion. I mean did you see any other error before this NACK issue?

ndreys commented 5 months ago

No, not that I know of. Can you give me a bit more context on what you are trying to probe for?

SeanMeng0509 commented 5 months ago

Thanks. I just want to confirm whether it could be triggered by some exception cases.

SeanMeng0509 commented 5 months ago

Hi @ndreys, I agree with you there can be possible race condition in this case. We will merge your fix into FSP 5.2 release. Thank you so much for your help!