rogerclarkmelbourne / Arduino_STM32

Arduino STM32. Hardware files to support STM32 boards, on Arduino IDE 1.8.x including LeafLabs Maple and other generic STM32F103 boards
Other
2.52k stars 1.26k forks source link

I2C stop problem #799

Closed david314 closed 3 years ago

david314 commented 4 years ago

I have been experiencing some strange behavior with my Arduino Maple design lately, to be more precise with the I2C bus. To make the issue repeatable I checked the two examples _i2c_scannerwire and _i2c_scannersoftwire. The latter works and detects the two devices connected to the bus while the other example hangs after the first access to the bus. It appears something happens during the issuing of the stop condition because both SDA and SCL are released almost simultaneously.

The following diagram shows the waves for the first access of the working example _i2c_scannersoftwire:

i2c_scanner_softwire

The stop condition is issued correctly and the program continues searching for devices on the bus. The buggy example _i2c_scannerwire shows an erroneous behavior with a wrong stop condition:

i2c_scanner_wire

While the working example runs the loop, the buggy example hangs after the first bus access shown here. The different behavior of the rising clock edge on the first diagram is because of the higher frequency, both codes ran on the same system. Has anyone encountered the same problem?

stevstrong commented 4 years ago

Can you please change the third parameter from false to true: https://github.com/rogerclarkmelbourne/Arduino_STM32/blob/7a7659d1116c11422a3025713cd5c208da991c86/STM32F1/libraries/Wire/utility/WireBase.h#L107 Then check again.

stevstrong commented 4 years ago

I can reproduce the problem, there is no valid stop issued on I2C bus. I need more time to find the root cause, I need to check the whole code to realize what is going on there. Meanwhile if anyone else wants to do it more effectively, feel free to do it.

dewhisna commented 4 years ago

Just to be sure, is this with using the Wire library or the WireSlave library or both? Of the two, WireSlave is the primary one I was working with and is hardware I2C only. There's some shared code that the Wire library uses from it for the hardware I2C parts, with the key difference being that WireSlave supports both master and slave modes, and Wire only supports master mode.

If you guys were testing with the Wire library, I'm curious if you are seeing the same issue if you switch to the WireSlave library?

Also, in the waveforms presented, they look to be the same with the exception of the exact bit-timings. One is clearly soft-I2C based, since the waveform toggles right at the middle of the clock cycle. And the other is clearly hard-I2C based, as the data toggles are happening almost concurrently with the clock edge. But otherwise, I don't see a substantial difference between them -- what exactly about the waveform are you saying is wrong??

If it's just hanging, most times that is due to missing (or wrong) pull-up resistors on those pins. What value pull-up resistors are you guys using??

stevstrong commented 4 years ago

Donna, you have adapted the i2c module to be compatible with your Wire slave lib, see https://github.com/rogerclarkmelbourne/Arduino_STM32/commit/d52422c1a90a65eeaecd039c001abc2782e3fa0c

Please pay attention to the missing stop condition after the red "1". With software I2C you can see a clean stop condition, while the HW interface fails to generate it correctly, the last blue rising edge should be delayed from the last red rising edge. Is it maybe due to a race condition somehow?

dewhisna commented 4 years ago

Yes, I know I added support for slave I2C, but slave mode is only supported in the WireSlave library. Here, I think you are using master-mode and both Wire and WireSlave support master. So which library are you using? I ask, because I do port scanning for device presence on the WireSlave library all the time and haven't had any issues with it hanging.

So if you were using Wire can use you try WireSlave and see if it behaves the same?

david314 commented 4 years ago

The two examples I posted earlier use both the Wire.h library. I just tested the example i2c_scanner_wire.ino with the Wire_slave.h library. It features even more strange artifacts, I see a repeated start condition when the slave doesn't answer:

grafik

When it detects the first device at address 0x28 the I2C module makes a long pause after transmitting the address and announcing a write access:

grafik

The second device on the bus is found at 0x76 as expected with the following sequence:

grafik

This has nothing to do with wrong pull-ups, there is a mayor problem in the code.

stevstrong commented 4 years ago

I tested Wire.h. But as far as I can see, Wire_slave.h has the same problem, on the place where the stop condition should be the sda and scl line rising edges overlap.

david314 commented 4 years ago

Yes, the rising flank just before the second START is a "broken" STOP, it can bee seen in all above waveforms.

dewhisna commented 4 years ago

I don't mind digging into this deeper, as it needs to be resolved, but it might be a little while. With all of the rain and flooding this week, my lab is currently under water or more specifically I'm in the long and expensive process of cleaning it and assessing the damage.

stevstrong commented 3 years ago

In order to test the hardware, I modified the i2c_transfer function to a simplified version by polling the flags instead of using interrupts:

int32 i2c_master_xfer(i2c_dev *dev, i2c_msg *msg, uint32 timeout)
{
    if (msg == NULL) return 0;

    dev->error_flags = 0;
    dev->msg = msg;
    dev->msg->xferred = 0;
    dev->timestamp = millis;
    dev->state = I2C_STATE_BUSY;

    dev->regs->SR1 = 0;             // Reset error/status flags

    dev->regs->CR1 = (I2C_CR1_PE | I2C_CR1_START);

    __IO uint32_t sr1;
    // wait for end of START
    do {
        sr1 = dev->regs->SR1;     // read status register
    } while ( !(sr1 & I2C_SR1_SB) ); // start bit

    // send address
    i2c_send_slave_addr(dev, dev->msg->addr, 0);

    // wait for end of ADDR
    do {
        sr1 = dev->regs->SR1;     // read status register
    } while ( !(sr1 & (I2C_SR1_ADDR|I2C_SR1_AF)) ); // address ACK or NACK

        // generate STOP
    dev->regs->CR1 |= I2C_CR1_STOP;

    if (sr1 & I2C_SR1_AF)
        return 1; // NACK
    else
        return 0; // ACK
}

With this, I get correct STOP condition on the bus, and the scanner also recognizes the attached device.

Bluepill I2C_correct STOP

So it seems to me that the the wire (slave) lib somehow does not handle correctly the NACK condition using interrupts. Next I will try to use interrupts for master mode only.

dewhisna commented 3 years ago

@stevstrong -- I updated PR #804 to move the location of the STOP handling in master mode so that the CR1 register can get polled for completion and eliminate the race-condition between the irq handlers.

stevstrong commented 3 years ago

@dewhisna , I will continue to write here because this is the issue we are trying to solve.

While trying to get the same behaviour with IRQs, I realized the problem is not in transfer function, but the layer above. If I change my simple version (see above) to return I2C_ERROR_PROTOCOL instead of "1" then the STOP is damaged again.

    do {
        sr1 = dev->regs->SR1;     // read status register
    } while ( !(sr1 & (I2C_SR1_ADDR|I2C_SR1_AF)) ); // ACK or NACK

    dev->regs->CR1 = I2C_CR1_PE | I2C_CR1_STOP;
    if (sr1 & I2C_SR1_AF)
        return 1; // when changing to I2C_ERROR_PROTOCOL the STOP will be destroyed
    else
        return 0;

This lead me to check what happens in the process() function in Wire.cpp and I see there https://github.com/rogerclarkmelbourne/Arduino_STM32/blob/179c0261cdd2414645104f8d57314e67d699451d/STM32F1/libraries/Wire/Wire.cpp#L43-L54 If you look to lines 52-53 you see that the I2C will be disabled and re-enabled again. This is what causes the STOP to fail. If I comment out those lines the STOP will be generated correctly.

I did not try your suggestion yet.

stevstrong commented 3 years ago

Then I changed the process() function to this:

    if (res == I2C_ERROR_PROTOCOL) {
        if (sel_hard->error_flags & I2C_SR1_AF) { /* NACK */
            res = (sel_hard->error_flags & I2C_SR1_ADDR ? ENACKADDR : 
                                                          ENACKTRNS);
        } else {
            if (sel_hard->error_flags & I2C_SR1_OVR) { /* Over/Underrun */
                res = EDATA;
            } else { /* Bus or Arbitration error */
                res = EOTHER;
            }
            i2c_disable(sel_hard);
            i2c_master_enable(sel_hard, dev_flags, frequency);
        }
    }

and it started to work correctly.

So basically we should differentiate between NACK and other type or errors when considering to disable-enable the interface. Also, this should be brought to the same level where the STOP s generated in order to avoid parallel error handling causing similar issues.

stevstrong commented 3 years ago

@david314 , please test the new code and let me know if it fixes the issue.

dewhisna commented 3 years ago

@stevstrong -- the problem with your fix is that the reset is needed in some cases to get the bus back in a nominal state. Please try my updated #804 because it shouldn't be returning now from the middle code layer until after the STOP has finished being transmitted -- it's got a while loop now that won't let it. Prior, it had issued the STOP but there was a race-condition present.

stevstrong commented 3 years ago

My fix still performs reset in case of HW error, but in case of NACK, which is not considerd a HW error. I will try your fix later.

dewhisna commented 3 years ago

Yes, I understand both that your change still does a reset on HW error and that NACK is technically not an error, but I've seen cases where a reset was still needed in the presence of certain misbehaving I2C devices. I don't remember all of the specifics and would have to dig through lots of old code and docs to find the details, but I think that's why it was that way in the code to begin with. And unfortunately I'm in the process of switching my main development PC here to a new machine (no thanks to Nvidia and their horrible nonsense with buggy drivers and lack of open-source support for the Linux community), so I'm somewhat limited in what I can do until that data migration process is complete.

Though I see you've already pushed your test code to master, which you really shouldn't have done yet. You should have created another PR for it for others to test and comment on, especially since it was an issue others were working on testing and resolving, because now undoing it will either unnecessarily create commit clutter or require a force push that's going to make lots of people using the repo very unhappy. But, it's not my repo ... so...

stevstrong commented 3 years ago

I pushed the change because I have tested it and am convinced that it resolves this issue, namely the STOP generation for NACK. OK, I admit, I could have made first a PR for this, but this way was quicker. The scanner seems to be used very often so with the push I will probably avoid some further "scanner does not work" posts or bug reports.

Anyway, I am open to make further changes if needed to handle special error cases (different issue) if you or anyone else bring prove that is necessary.