Open PaulZC opened 3 years ago
Possibly also seen here with u-blox GNSS modules. @paulvha suspects SMPCNT too:
https://forum.sparkfun.com/viewtopic.php?f=168&t=55659&p=225605&hilit=smpcnt#p225605
I have tried quiet a few things to try to get to the root of this.
I agree the datasheet is quiet cryptic - offering very little to help. A couple interesting observations, that I'm leaving as a note for myself: -The clock timings after the clock stretch are correct, indicating that something about the processor is correctly determining that the clock stretching is happening (and that the subsampling set by SMPCNT is doing something right...ish) -You can disable the ability to detect clock stretching smaller than 1 clock cycle (STRDIS), in my testing at 100khz this actually works substantially better! -The problem doesn't appear to be limited to sub 1 clock cycle stretching (see the 400khz above), but perhaps all clock stretching in general. -The status of a NAK is kept in the INTSTAT register. Observing this register shows that the non-clock stretched example properly shows a NAK when no ACK occurs, however the result is different with clock stretching. This indicates the problem is in the hardware config or transmission firmware, and NOT in the error handling in the software after the write.
I have updated the 100khz config to disable the sub 1 cycle clock stretching for now, since it seems to work significantly better than having it enabled, but looking at the scope of the clock after this, I am not confident that this is any kind of permanent solution (the last clock has a very inconsistent timing).
Nice one - thank you! Paul
Hi @Wenn0101 , Humm. I think I have bad news here... When I run the new OpenLog Artemis code on v2.1.1 and connect it to a u-blox ZED-F9P at 100kHz, I see way more I2C checksum errors than with v2.1.0. I will try and pin this down a bit more. I'll try 400kHz on v2.1.1 and see what that gives me. Cheers for now, Paul
Hi @Wenn0101 ,
In the following, I'm running the exact same code on the exact same OpenLog Artemis + ZED-F9P. The only things I'm changing are the version of the core and the I2C clock speed. I've temporarily hacked the u-blox library so it will always print a debug message when a checksum error occurs on the UBX PositionVelocityTime packet - which indicates general badness (loss or corruption of data) on the I2C bus. To keep it simple, I'm only logging GNSS date, time, SatellitesInView and positionDilutionOfPosition. The logging rate is set to 5Hz.
The u-blox module has its own internal pull-ups. I have the 2.2k resistors on the ZED-F9P breakout enabled via the split pads - I need those to get to 400kHz. The Artemis' internal pull-ups are disabled. This combination of pull-ups does still slow the clock rise time and so limit the achievable clock speed.
Here's what I see with v2.1.0 of the core at 100kHz. No checksum errors to be seen. Actual clock speed measured via 'scope is 100kHz.
Here's v2.1.0 of the core at 400kHz. Again no checksum errors. Actual clock speed via 'scope is 267kHz.
Here's v2.1.1 of the core at 100kHz. Badness... Actual clock speed is 100kHz.
Here's v2.1.1 of the core at 400kHz. All is good. Actual clock speed is 267kHz.
Sadly, this means I'll need to stick with v2.1.0 for now... Sorry!
Cheers, Paul
Yea, the solution I found was a bit of a hack. I suppose I shouldn't be surprised that it didn't work well with all devices. I need to find a better solution to this problem, and pull the change that I made in the next release and hopefully i'll have a better solution in the next one.
After reading I have done a number of tests to see whether I can help. My conclusion in short : the issue only happens when there is a (very) short clock stretch.
Maybe there is a relation to the I2C issue mentioned in https://ambiq.com/wp-content/uploads/2020/08/Apollo-MCU-Silicon-Errata.pdf
I would not explain why this is working on library V1 and not on V2. One aspect that comes to mind is the processor load with V2, due to MBED, is much higher. In the tests the time between the sending the different I2C addresses is about 0.1mS with V1.2.1, where it is about 3mS with V2.1.1. This is the result of the overhead of MBED. Not sure how that would impact the IOM performance.
My test I have used an SCD30, which is using clock stretch, and the code example from PaulCZ to detect. The SCD30 clock stretch is not consistent in timing, but is always longer than the SCL clock cycle. I don't own a QWIIC button or ZED-F9P so can not test. The SCD30 was ALWAYS detected correctly no matter what speed.
Configuration I have checked the IOM I2C configuration setting between V1.2 and V2.1.1. Apart from the change made by Wenn010 for 100Khz, they are the same
Timing With the SCD30 either on 100KHZ or 400KHZ, I see the clock cycle timing is really 100KHZ or 400KHZ whether 1.2.1 or 2.1.1.
Clock stretch On the scope output, I clearly see clock stretch happening, but always longer than one SCL clock cycle:
V1.2.1 / 100Khz
V2.1.1 / 100Khz
V1.2.1 / 400Khz
V2.1.1 / 400Khz
I would NOT disable the clock stretch check below one clock cycle, but would try the impact of SMPCNT when it is set to 1. An alternative is to run at lower rate as Paul(CZ) did. I have used that on a Raspberry PI before with other sensors (e.g. CCS811 on 35Khz)
Can this be related to the problems encountered when trying to communicate with the BNO08x?
Hi JR,
Yes, almost certainly.
Best wishes, Paul
+1 to this issue. I2C on Artemis doesn't seem to work out-of-the-box. I have to change the clock speed away from the default 100 kHz to make it work with the Sparkfun sensors I've tried. Details are here.
I think I have found the root cause and the solution. I bought a QWIIC RED button board a week ago and started deep-dive on the I2C-IOM settings.
The good news is that clock stretching is working perfectly well an Apollo3. No need to change anything for that from the default values.
The real issue is that the Apollo3 has a problem when only the address is sent (as is done with the I2C scanner) to a device that is performing clock stretching, the Apollo3 will ALWAYS report back an ACK, even if the logic analyzer clearly shows it should be a NAK. BUT when sending the address + a dummy byte (something like 0x21) it will detect the NAK and ACK 100% correct. Chapter 5 clearly shows that and the adjusted sketch is in Appendix B in the attached document.
I have also done a deep-dive on the QWIIC button hardware and code. It is amazing to see the clock stretch happening there . I have documented the reasons in the attached document.
The other aspect that PaulCZ observed "Actual clock speed via 'scope is 267kHz" on a 400Khz speed is caused by an incorrect SCLENDLY setting of 3. Do not make this higher than 2 and you have 400Khz. Also, I would recommend changing the setting for 1Mhz as the current setup brings 857Khz and it could be 920Khz with a better duty cycle. It is documented in chapter 6.1.
The attached document is full of the findings of the deep-dive / compares, screenshots from the logic analyzer and more information about I2C settings and insight in the QWIIC button.
hope this helps
regards, Paul
Excellent detective work Paul (@paulvha )! Sincere thanks, Paul (@PaulZC )
Thanks for the detailed analysis! I will read thru this full report today, and see what changes I need to make.
-Kyle
The Qwiic Button uses sub-one cycle clock stretching on the I2C bus. This causes the standard
beginTransmission
endTransmission
'scanner' test for an I2C device to return a false positive.Steps To Replicate:
Connect an unmodified Qwiic Button to e.g. an Artemis Thing Plus. Load core v2.1.0. Upload the following simple 'I2C scanner' sketch:
The Qwiic Button has a default address of 0x6F, so you should see this (as captured with an ESP32 Thing Plus):
but instead we get this:
Diagnosis:
Looking at the I2C bus with a logic analyzer, we can see the sub-one cycle clock stretching (note the gap before the NACK):
The I2C bus traffic is correct and the processor should see this as a NACK (device not present). But
endTransmission
is returning zero. It shouldn't...Here's the correct ACK at address 0x6F:
Here's the same test at 400kHz. It gives the same false positives. The clock stretching is much more obvious:
But at 50kHz (thank you v2!), all is well. Note the lack of clock-stretching:
Suspected cause:
I think the root cause is our old friend SMPCNT:
100kHz and 400kHz set SMPCNT to 3:
https://github.com/sparkfun/mbed-os-ambiq-apollo3/blob/be07f057170a4e9ed4a286abb2170b9df3d52de3/targets/TARGET_Ambiq_Micro/TARGET_Apollo3/sdk/mcu/apollo3/hal/am_hal_iom.c#L2203
https://github.com/sparkfun/mbed-os-ambiq-apollo3/blob/be07f057170a4e9ed4a286abb2170b9df3d52de3/targets/TARGET_Ambiq_Micro/TARGET_Apollo3/sdk/mcu/apollo3/hal/am_hal_iom.c#L2223
But the
default
setting (which is what would be used for 50kHz), sets SMPCNT to 33 (0x21):https://github.com/sparkfun/mbed-os-ambiq-apollo3/blob/be07f057170a4e9ed4a286abb2170b9df3d52de3/targets/TARGET_Ambiq_Micro/TARGET_Apollo3/sdk/mcu/apollo3/hal/am_hal_iom.c#L2293
Now, I don't know what the answer to this is... I've tried manually setting SMPCNT for 100kHz and I can't find a value which works. There is a cryptic note in the Apollo3 Blue datasheet:
so maybe only certain values work for SMPCNT which "allow for synchronization time"? Multiples of three perhaps?