sparkfun / Arduino_Apollo3

Arduino core to support the Apollo3 microcontroller from Ambiq Micro
83 stars 37 forks source link

I2C sub-one cycle clock stretching causes problems. Me, I blame SMPCNT. #400

Open PaulZC opened 3 years ago

PaulZC commented 3 years ago

The Qwiic Button uses sub-one cycle clock stretching on the I2C bus. This causes the standard beginTransmission endTransmission 'scanner' test for an I2C device to return a false positive.

Steps To Replicate:

Connect an unmodified Qwiic Button to e.g. an Artemis Thing Plus. Load core v2.1.0. Upload the following simple 'I2C scanner' sketch:

#include <Wire.h>

void setup()
{
  Serial.begin(115200);
  Wire.begin();
  Wire.setClock(100000);

  for (byte address = 1; address < 127; address++ )
  {
    Wire.beginTransmission(address);
    byte error = Wire.endTransmission();

    if (error == 0)
    {
      Serial.print("I2C device found at address 0x");
      Serial.println(address, HEX);
    }
    else
    {
      Serial.print("I2C error ");
      Serial.print(error);
      Serial.print(" at address 0x");
      Serial.println(address, HEX);
    }
  }
}

void loop()
{
}

The Qwiic Button has a default address of 0x6F, so you should see this (as captured with an ESP32 Thing Plus):

...
I2C error 2 at address 0x66
I2C error 2 at address 0x67
I2C error 2 at address 0x68
I2C error 2 at address 0x69
I2C error 2 at address 0x6A
I2C error 2 at address 0x6B
I2C error 2 at address 0x6C
I2C error 2 at address 0x6D
I2C error 2 at address 0x6E
I2C device found at address 0x6F
I2C error 2 at address 0x70
I2C error 2 at address 0x71
I2C error 2 at address 0x72
I2C error 2 at address 0x73
I2C error 2 at address 0x74
I2C error 2 at address 0x75
I2C error 2 at address 0x76
...

but instead we get this:

I2C device found at address 0x1
I2C device found at address 0x2
I2C device found at address 0x3
I2C device found at address 0x4
I2C device found at address 0x5
I2C device found at address 0x6
I2C device found at address 0x7
I2C device found at address 0x8
I2C device found at address 0x9
I2C device found at address 0xA
I2C device found at address 0xB
I2C device found at address 0xC
I2C device found at address 0xD
I2C device found at address 0xE
I2C device found at address 0xF
I2C device found at address 0x10
I2C device found at address 0x11
...

Diagnosis:

Looking at the I2C bus with a logic analyzer, we can see the sub-one cycle clock stretching (note the gap before the NACK):

image

The I2C bus traffic is correct and the processor should see this as a NACK (device not present). But endTransmission is returning zero. It shouldn't...

Here's the correct ACK at address 0x6F:

image

Here's the same test at 400kHz. It gives the same false positives. The clock stretching is much more obvious:

image

But at 50kHz (thank you v2!), all is well. Note the lack of clock-stretching:

image

...
I2C error 2 at address 0x6B
I2C error 2 at address 0x6C
I2C error 2 at address 0x6D
I2C error 2 at address 0x6E
I2C device found at address 0x6F
I2C error 2 at address 0x70
I2C error 2 at address 0x71
I2C error 2 at address 0x72
I2C error 2 at address 0x73
...

Suspected cause:

I think the root cause is our old friend SMPCNT:

image

100kHz and 400kHz set SMPCNT to 3:

https://github.com/sparkfun/mbed-os-ambiq-apollo3/blob/be07f057170a4e9ed4a286abb2170b9df3d52de3/targets/TARGET_Ambiq_Micro/TARGET_Apollo3/sdk/mcu/apollo3/hal/am_hal_iom.c#L2203

https://github.com/sparkfun/mbed-os-ambiq-apollo3/blob/be07f057170a4e9ed4a286abb2170b9df3d52de3/targets/TARGET_Ambiq_Micro/TARGET_Apollo3/sdk/mcu/apollo3/hal/am_hal_iom.c#L2223

But the default setting (which is what would be used for 50kHz), sets SMPCNT to 33 (0x21):

https://github.com/sparkfun/mbed-os-ambiq-apollo3/blob/be07f057170a4e9ed4a286abb2170b9df3d52de3/targets/TARGET_Ambiq_Micro/TARGET_Apollo3/sdk/mcu/apollo3/hal/am_hal_iom.c#L2293

Now, I don't know what the answer to this is... I've tried manually setting SMPCNT for 100kHz and I can't find a value which works. There is a cryptic note in the Apollo3 Blue datasheet:

image

so maybe only certain values work for SMPCNT which "allow for synchronization time"? Multiples of three perhaps?

PaulZC commented 3 years ago

Possibly also seen here with u-blox GNSS modules. @paulvha suspects SMPCNT too:

https://forum.sparkfun.com/viewtopic.php?f=168&t=55659&p=225605&hilit=smpcnt#p225605

Wenn0101 commented 3 years ago

I have tried quiet a few things to try to get to the root of this.

I agree the datasheet is quiet cryptic - offering very little to help. A couple interesting observations, that I'm leaving as a note for myself: -The clock timings after the clock stretch are correct, indicating that something about the processor is correctly determining that the clock stretching is happening (and that the subsampling set by SMPCNT is doing something right...ish) -You can disable the ability to detect clock stretching smaller than 1 clock cycle (STRDIS), in my testing at 100khz this actually works substantially better! -The problem doesn't appear to be limited to sub 1 clock cycle stretching (see the 400khz above), but perhaps all clock stretching in general. -The status of a NAK is kept in the INTSTAT register. Observing this register shows that the non-clock stretched example properly shows a NAK when no ACK occurs, however the result is different with clock stretching. This indicates the problem is in the hardware config or transmission firmware, and NOT in the error handling in the software after the write.

I have updated the 100khz config to disable the sub 1 cycle clock stretching for now, since it seems to work significantly better than having it enabled, but looking at the scope of the clock after this, I am not confident that this is any kind of permanent solution (the last clock has a very inconsistent timing).

PaulZC commented 3 years ago

Nice one - thank you! Paul

PaulZC commented 3 years ago

Hi @Wenn0101 , Humm. I think I have bad news here... When I run the new OpenLog Artemis code on v2.1.1 and connect it to a u-blox ZED-F9P at 100kHz, I see way more I2C checksum errors than with v2.1.0. I will try and pin this down a bit more. I'll try 400kHz on v2.1.1 and see what that gives me. Cheers for now, Paul

PaulZC commented 3 years ago

Hi @Wenn0101 ,

In the following, I'm running the exact same code on the exact same OpenLog Artemis + ZED-F9P. The only things I'm changing are the version of the core and the I2C clock speed. I've temporarily hacked the u-blox library so it will always print a debug message when a checksum error occurs on the UBX PositionVelocityTime packet - which indicates general badness (loss or corruption of data) on the I2C bus. To keep it simple, I'm only logging GNSS date, time, SatellitesInView and positionDilutionOfPosition. The logging rate is set to 5Hz.

The u-blox module has its own internal pull-ups. I have the 2.2k resistors on the ZED-F9P breakout enabled via the split pads - I need those to get to 400kHz. The Artemis' internal pull-ups are disabled. This combination of pull-ups does still slow the clock rise time and so limit the achievable clock speed.

Here's what I see with v2.1.0 of the core at 100kHz. No checksum errors to be seen. Actual clock speed measured via 'scope is 100kHz.

image

Here's v2.1.0 of the core at 400kHz. Again no checksum errors. Actual clock speed via 'scope is 267kHz.

image

Here's v2.1.1 of the core at 100kHz. Badness... Actual clock speed is 100kHz.

image

Here's v2.1.1 of the core at 400kHz. All is good. Actual clock speed is 267kHz.

image

Sadly, this means I'll need to stick with v2.1.0 for now... Sorry!

Cheers, Paul

Wenn0101 commented 3 years ago

Yea, the solution I found was a bit of a hack. I suppose I shouldn't be surprised that it didn't work well with all devices. I need to find a better solution to this problem, and pull the change that I made in the next release and hopefully i'll have a better solution in the next one.

paulvha commented 3 years ago

After reading I have done a number of tests to see whether I can help. My conclusion in short : the issue only happens when there is a (very) short clock stretch.

Maybe there is a relation to the I2C issue mentioned in https://ambiq.com/wp-content/uploads/2020/08/Apollo-MCU-Silicon-Errata.pdf

I would not explain why this is working on library V1 and not on V2. One aspect that comes to mind is the processor load with V2, due to MBED, is much higher. In the tests the time between the sending the different I2C addresses is about 0.1mS with V1.2.1, where it is about 3mS with V2.1.1. This is the result of the overhead of MBED. Not sure how that would impact the IOM performance.

My test I have used an SCD30, which is using clock stretch, and the code example from PaulCZ to detect. The SCD30 clock stretch is not consistent in timing, but is always longer than the SCL clock cycle. I don't own a QWIIC button or ZED-F9P so can not test. The SCD30 was ALWAYS detected correctly no matter what speed.

Configuration I have checked the IOM I2C configuration setting between V1.2 and V2.1.1. Apart from the change made by Wenn010 for 100Khz, they are the same

Timing With the SCD30 either on 100KHZ or 400KHZ, I see the clock cycle timing is really 100KHZ or 400KHZ whether 1.2.1 or 2.1.1.

Clock stretch On the scope output, I clearly see clock stretch happening, but always longer than one SCL clock cycle:

V1.2.1 / 100Khz afbeelding

V2.1.1 / 100Khz afbeelding

V1.2.1 / 400Khz afbeelding

V2.1.1 / 400Khz afbeelding

I would NOT disable the clock stretch check below one clock cycle, but would try the impact of SMPCNT when it is set to 1. An alternative is to run at lower rate as Paul(CZ) did. I have used that on a Raspberry PI before with other sensors (e.g. CCS811 on 35Khz)

jerabaul29 commented 3 years ago

Can this be related to the problems encountered when trying to communicate with the BNO08x?

PaulZC commented 3 years ago

Hi JR,

Yes, almost certainly.

Best wishes, Paul

chubb commented 2 years ago

+1 to this issue. I2C on Artemis doesn't seem to work out-of-the-box. I have to change the clock speed away from the default 100 kHz to make it work with the Sparkfun sensors I've tried. Details are here.

paulvha commented 2 years ago

I think I have found the root cause and the solution. I bought a QWIIC RED button board a week ago and started deep-dive on the I2C-IOM settings.

Clock stretching

The good news is that clock stretching is working perfectly well an Apollo3. No need to change anything for that from the default values.

ALWAYS ACK

The real issue is that the Apollo3 has a problem when only the address is sent (as is done with the I2C scanner) to a device that is performing clock stretching, the Apollo3 will ALWAYS report back an ACK, even if the logic analyzer clearly shows it should be a NAK. BUT when sending the address + a dummy byte (something like 0x21) it will detect the NAK and ACK 100% correct. Chapter 5 clearly shows that and the adjusted sketch is in Appendix B in the attached document.

QWIIC button

I have also done a deep-dive on the QWIIC button hardware and code. It is amazing to see the clock stretch happening there . I have documented the reasons in the attached document.

I2C speed

The other aspect that PaulCZ observed "Actual clock speed via 'scope is 267kHz" on a 400Khz speed is caused by an incorrect SCLENDLY setting of 3. Do not make this higher than 2 and you have 400Khz. Also, I would recommend changing the setting for 1Mhz as the current setup brings 857Khz and it could be 920Khz with a better duty cycle. It is documented in chapter 6.1.

The attached document is full of the findings of the deep-dive / compares, screenshots from the logic analyzer and more information about I2C settings and insight in the QWIIC button.

I2C_clock_stretch.odt

hope this helps

regards, Paul

PaulZC commented 2 years ago

Excellent detective work Paul (@paulvha )! Sincere thanks, Paul (@PaulZC )

Wenn0101 commented 2 years ago

Thanks for the detailed analysis! I will read thru this full report today, and see what changes I need to make.

-Kyle