Closed dewhisna closed 3 years ago
I tried but did not help. It is weird, I can track that the SW sets the STOP bit of CR1 register, but the bit does not remain set and no stop condition is generated. The spec says:
The bit is set and cleared by software, cleared by hardware when a Stop condition is detected, set by hardware when a timeout error is detected.
So theoretically the bit should remain set in master mode as long as not cleared by SW, right?
Interesting. Don't close this PR, though, as I'm 99% certain these two additional conditions also needed the STOP
set. Leave it open until we get to the bottom of this. I was hoping that the BERR
case was the one you guys were seeing.
Maybe @david314 who opened #799 can give this a test too and capture the waveforms on his scope to see how they compare, as I don't yet know how long my lab is going to be out of commission.
I have found something curious about the STOP bit. The spec says:
In Master Mode: 0: No Stop generation. 1: Stop generation after the current byte transfer or after the current Start condition is sent.
But:
So does that actually mean that after the NACK-ed address no STOP condition can be generated?
@stevstrong -- no, but it does mean I need to set the STOP
flag if there's no data to send.
Give this PR another go now that I've added d242a81.
@stevstrong -- let me know what that change does on your test setup. I think it will fix the STOP
bit issue, but I'm a little concerned of a potential race-condition between setting the status to I2C_STATE_XFER_DONE
vs I2C_STATE_ERROR
. That's because on a write like this with no data, once it has finished writing the address to the port, it is technically done and will set the status to I2C_STATE_XFER_DONE
.
However, since the Arduino API is a blocking operation, it shouldn't return until the full status is known, which would be one interrupt cycle later, otherwise the function that's spinning waiting for the operation to complete will think it successfully completed, even in the error case. That would mean that this i2c_scanner_wire
example may falsely report nodes being there when it shouldn't.
If it does, try removing the setting of bDone = 1
on line 560 of d242a81: https://github.com/rogerclarkmelbourne/Arduino_STM32/blob/d242a81db8017f4b16c90698be62067934821137/STM32F1/cores/maple/libmaple/i2c.c#L560 and see how that affects it. That will force it through one more interrupt cycle of either the main I2C IRQ or the I2C Error IRQ and both of those have logic to handle the setting of I2C_STATE
.
I tried your latest change, it will generate a clean STOP condition BUT ... it will break all the rest This is the serial output with the change:
I2C Scanner
Scanning...
I2C device found at address 0x01
I2C device found at address 0x02
I2C device found at address 0x03
I2C device found at address 0x04
I2C device found at address 0x05
I2C device found at address 0x06
I2C device found at address 0x07
I2C device found at address 0x08
I2C device found at address 0x09
I2C device found at address 0x0A
I2C device found at address 0x0B
I2C device found at address 0x0C
I2C device found at address 0x0D
I2C device found at address 0x0E
I2C device found at address 0x0F
I2C device found at address 0x10
I2C device found at address 0x11
I2C device found at address 0x12
I2C device found at address 0x13
I2C device found at address 0x14
I2C device found at address 0x15
I2C device found at address 0x16
I2C device found at address 0x17
I2C device found at address 0x18
I2C device found at address 0x19
I2C device found at address 0x1A
I2C device found at address 0x1B
I2C device found at address 0x1C
I2C device found at address 0x1D
I2C device found at address 0x1E
I2C device found at address 0x1F
I2C device found at address 0x20
I2C device found at address 0x21
I2C device found at address 0x22
I2C device found at address 0x23
I2C device found at address 0x24
I2C device found at address 0x25
I2C device found at address 0x26
I2C device found at address 0x27
I2C device found at address 0x28
I2C device found at address 0x29
I2C device found at address 0x2A
I2C device found at address 0x2B
I2C device found at address 0x2C
I2C device found at address 0x2D
I2C device found at address 0x2E
I2C device found at address 0x2F
I2C device found at address 0x30
I2C device found at address 0x31
I2C device found at address 0x32
I2C device found at address 0x33
The program will hang somewhere (the serial monitor also locks), because only a reset helps out from this situation.
If I remove the line with bDone = 1
then everything goes back to the original state, no STOP condition is generated.
However, in this case the HW will lock up somehow because no signal will be generated after a couple of scans, no I2C device will be recognized, although the main loop keeps running:
Scanning...
I2C device found at address 0x29
done
Scanning...
I2C device found at address 0x29
done
Scanning...
I2C device found at address 0x29
done
Scanning...
I2C device found at address 0x29
done
Scanning...
I2C device found at address 0x29
done
Scanning...
No I2C devices found
Scanning...
No I2C devices found
Scanning...
No I2C devices found
The whole thing is very unstable and sensitive, even without the latest change, if I remove one of the scope probes from the I2C pins, then it starts to spin, either device will be recognized at impossible addresses, or the SW hangs and goes on only when I apply the probe again.
I2C device found at address 0x29
done
Scanning...
I2C device found at address 0x29
done
Scanning...
I2C device found at address 0x01
Unknown error at address 0x02
I2C device found at address 0x04
Unknown error at address 0x05
I2C device found at address 0x09
Unknown error at address 0x0A
I2C device found at address 0x2A
Unknown error at address 0x2B
I2C device found at address 0x2D
I2C device found at address 0x2E
Unknown error at address 0x2F
Unknown error at address 0x37
Unknown error at address 0x4B
Unknown error at address 0x56
Unknown error at address 0x5E
done
Scanning...
I2C device found at address 0x29
done
Scanning...
I2C device found at address 0x29
done
@stevstrong - I'm concerned about your report of seeing a change with the scope probes. That tells me it's probably missing proper pull-up resistors or something such that changing the loading by a few picofarads is enough to change its behavior. What size pull-up resistors do you have on the SDA/SCL lines?
I use a VL6180 module with 2k2 on-board pull ups. The rising edges are steep.
The situation with No I2C devices found
I can always reproduce by removing the probe from the SCL line. It might generate a false pulse which sets the HW in an unknown state.
Recovering is only possible either by reset or by removing the other probe from the SDA line, and keep re-applying the probe to SDA line a couple of times until it scans again.
I also tried another module (ADPS-9960) with 10k pull ups, exactly the same behavior.
The change as submitted (with the bDone = 1
in place) is actually the correct choice. It doesn't "break all the rest". It only affects cases where you are doing a write operation without any data, like this port scanner example code. Read/Write operations that actually send data to the slave should continue to work well as it always did.
The trick will be to take this code and figure out how to add a synchronization point. The reason it's saying that devices are there when it isn't is the race-condition I mentioned. When it finishes sending the address, then the interrupt logic is technically done writing that message -- there's nothing else for it to do as far as sending data. Unfortunately, when it flags that it's done sending the message, the main loop, which is synchronously waiting for completion, thinks it's totally done and doesn't realize that it will be followed by an error interrupt saying it didn't get an acknowledge from any device on the bus.
So now, we must figure out a way for it to wait until that status is complete without artificially inducing extra delay into the entire thing. For that I'm going to have to think through this a bit and perhaps add a "waiting for status" state to denote that the message itself is finished being transmitted, but that it needs to wait until the I2C peripheral is done.
I'm not sure what to make of the hangs and such you are reporting. I'm not convinced that's necessarily I2C related -- it could be in the serial buffer logic in reporting the status even. And perhaps the probe connecting/disconnecting is enough to induce an external start signal and it's hung thinking another master is starting a write operation or something. I don't know. Usually sporadic things like that are more hardware centric than code centric, since code is generally repeatable.
I tried the example i2c_scanner_wire.ino
and it doesn't look good. As long as no device answers the START and STOP sequences on the bus appear to be generated correctly. But once a valid address (here 0x28) is transmitted over the bus, the device crashes. First there is a pause and then only START and STOP is generated on the bus. The device hangs after that. The behavior does repeat after a reset. Any ideas where to look next?
@david314 was your test with or without the bDone=1
? Also, can you better define the device crashes
? Is there any way to put a ST-Link on the JTAG port and catch it with gdb
and see if you can determine what exactly is crashing and where?
Also, another thing to try is a read
operation instead of a write
operation. The low-level I2C driver wasn't really designed for a zero-byte write, but there is a better chance that the zero-byte read operation might work due to how the ACK logic works on the read path. Can you try an experiment with i2c_scanner_wire.ino
and switch it to requestFrom
instead of a beginTransmission
? I'm curious what that would produce.
Any update on this?
@stevstrong Not on my side. I almost got my lab in the basement functioning again after it got flooded back in June. But things here have been crazy these last few months. I do have some time off coming in November and plan to look at it in detail then.
We never did hear back from @david314 on the questions I asked him in my last post to this thread. There is a chance that a read operation would succeed where the write fails because what is happening is that the IRQ loop only spins long enough to read or write the data being requested and by writing 0 bytes as this was doing, it decides there's nothing left to do and exits and then there's no good way to get the I2C status back to the caller if there's a timeout waiting for the address ACK.
But on the read side, since it's explicitly waiting for read data coming in (even if you ask it to read zero bytes), it waits for the response for the ACK since the entire logic is reversed there. So it's quite possible that the port scan would work if the main code in the example attempted a zero byte read instead of a zero byte write, and I was hoping @david314 would have had a chance to test that since he had things set up already, but we never heard back from him.
And I still don't know what the "crash" is he is seeing. There shouldn't be any "crash" regardless -- it should just miss the ports in the scan because one side wants to be asynchronous (the IRQ side) and the other wants to be synchronous and blocking (the transfer loop in the Arduino API). This is actually one of the places where the Arduino API fails to work well with STM32 peripherals. If it was asynchronous too, instead of spinning in a loop waiting for a response, then the main app could check status later and see if/when a reply comes in and decide if there's a device on that port or not. But since it wants to block until the transfer is complete, how long should it wait?
@stevstrong -- Now that I have my lab back up and going again, I've been able to do some testing with this. I just force updated this PR back to only the first change (as I don't think the second change was needed). It's working correctly here on my setup when I run the i2c_scanner_wire
example (at least the one under the WireSlave
library, which should be functionally equivalent to the one under the Wire
library, except perhaps the default for the timeout. I didn't try the one under the Wire
library).
It completes the scan exactly as expected both with and without an I2C device connected and seems to be creating the STOP. I initially had a BME280 sensor board plugged in. It correctly scanned the bus and found it. I unplugged the sensor while it was scanning. It correctly switched to saying no devices found. I plugged it back in, and it correctly scanned and found it again. With no other issues observed:
Scanning...
I2C device found at address 0x76
done
Scanning...
I2C device found at address 0x76
done
Scanning...
I2C device found at address 0x76
done
Scanning...
I2C device found at address 0x76
done
Scanning...
No I2C devices found
Scanning...
No I2C devices found
Scanning...
I2C device found at address 0x76
done
Scanning...
I2C device found at address 0x76
done
Now -- this was with this updated PR applied that fixes the stop condition for Bus Error and Timeout conditions, which I believe is needed to be correct. As I am seeing things here on the bench, this should fix #799, and should be able to be merged.
Ok, thanks, I will test it today.
Unfortunately it doesn't generate the STOP condition with the wire
library.
In general it is very unstable.
It founds the device (0x39) but if I put the scope probe on SDA then it goes crazy, sometimes founds a device at 0x01 and hangs as long as I keep the probe on SDA, or displays that no device was found but no signal is output on SDA nor on SCL. Very strange, behavior identical to what I described above.
The instability issues you describe sound hardware related instead of software. Do your instability issues still exist even without this PR applied?
Even if this doesn't fix #799 completely, it's still a valid fix for cases when there is either a bus error or a timeout condition and should be applied.
Well, the instability is there even without this PR, but I find really disturbing that the scanner can run through without giving any signal on output pins! Is this I2C HW problem or what else can be?
Based on the description of the symptoms you've given, it sounds like the pull-up resistors either aren't present or aren't making good contact. You need around a 4.7K resistor on each pin for it to be stable. Without them, you'd have random instabilities like you describe that change when a scope probe is connected/disconnected. And would cause the signal to not appear.
Easy test for that would be to disconnect power from the board and use a multimeter to measure the resistance between each of the I2C pins and V+. You should be able to read the pullup resistor value. And double check to make sure that neither one is shorted to either ground or another pin. If that measures OK, it could be a bad STM32 chip.
Pull up resistors are ok, chip is also ok, other applications runs well on it. Only I2C causes this instability. The voltages on I2C pins are constant high when no signal is there and scanner runs through.
Well, it may only be those I2C pins on the STM32 that are bad, not the entire chip, causing other apps to run OK. Do you have another board around to confirm?
If the pins would be bad, then the scanner shouldn't run after reset well, I think. But it does, it detects the chip, SDA and SCL signals are ok except the STOP condtion, Then the scope probe onto SDA makes it crazy...
If connecting the scope probe makes it go crazy, then how do you know that there's no STOP condition? If connecting the scope alters its behavior, then there's no way to think that what the scope shows would be valid.
Because when I reset the board while the probes are on, I can see on the scope the pulses for address 0x01 with no STOP condition. In this case the monitor says device found on 0x01 but there are no more pulses to see, the app seems to hang with SDA and SCL on logic high. If I take the probe away the app runs till end and continues too run again and again after the programmed delay not recognizing the attached module. If I put the probe again on SDA I still see logic high (3.3V) on both I2C lines while the monitor keeps telling that no device was found.
It sounds like either your device or probe is causing a clock stretching event causing the CPU to wait for it and since you are using Wire
instead of WireSlave
, there's no timeout and it hangs forever. Is there a chance that you have SDA
and SCL
lines swapped on your device?
In any case, you seem to have some sort of hardware issue that's completely unrelated to this PR or even the issue described by the OP of #799.
This PR does fix the missing STOP for bus error and timeout cases. I currently have this I2C code in about a dozen different devices on both STM32F103CB and STM32F103VG processors talking as a master to I2C OLED displays, BME280 sensors, and EEPROMS, and even one that communicates as a slave to a STM32F746NG board on one I2C port while talking as a master to an I2C DAC on another port, all without issues. So, I'm not sure what else to tell you other than to try different wires and different boards.
My issue may be totally different from this, I agree. Do you have a plot showing that this PR solves the missing STOP condition for the scanner? Even if your apps work, the scanner can still miss the STOP. My plots (when they work) still show missing STOP. And I am not sure that my setup causes that.
And this is how it looks like after reset.
Only one address is checked, while the monitor outputs:
17:33:17.356 -> I2C Scanner
17:33:17.356 -> Scanning...
17:33:19.602 -> No I2C devices found
17:33:22.598 -> Scanning...
17:33:24.895 -> No I2C devices found
The I2C lines are both high during the scanner continue to run.
It's hard to see the exact timing of that last bit in that image, but it sure looks to me like there's a STOP bit there. Now keep in mind that hardware I2C isn't going to be these nice long equally spaced manually clocked things you see with software I2C. As soon as the clock line rises, it will release the data line immediately after the hold time of the output. It looks to me like the clock transitions to high on that last bit before the bit data line is released, meaning it should be a STOP bit.
What I see in that screenshot is 7 address bits: 0000001, followed by a 0 (for write), followed by a NAK (a '1') from no device being present, followed by a '0' for the STOP bit, and then the data and clock lines are both released.
I suppose since you are using the Wire
library instead of WireSlave
, as I am, there could be an issue with the Wire
library getting error return codes back. Perhaps it's waiting indefinitely for a data transmission that's never going to happen??
I will try to get some screenshots from mine in the coming days. I've usually been using a little embedded micro based I2C port analyzer that displays the bus waveforms on a little LCD screen, which doesn't have a way to get a screenshot from. But, I can try to get a scope connected to it for comparison...
So what happens again when you switch over to using the WireSlave
library instead of Wire
? Is it the same?
I just tried with the current main branch and can confirm that i2c_scanner_wire
works with the library Wire_slave
and i2c_scanner_softwire
works with SoftWire
. What doesn't work is i2c_scanner_wire
with the library Wire
. It sometimes works a few times, then stops working at all. And there is something wrong with the STOP condition in this example. I will try the proposed fix next week.
Thanks @david314 for confirming that. In reality, the WireSlave
library/folder is the proper implementation of Hardware I2C for both Master and Slave (despite its name). And the Wire
library folder contains the implementation of Software I2C (SoftWire
) with a half-baked Hardware Master-only implementation (Wire
).
What really should happen long-term is to remove the half-baked Hardware I2C support called Wire
and leave SoftWire
. And rename WireSlave
to HardWire
, moving them all to the same folder. And then create a "parent" or top-level library called Wire
that brings in both SoftWire
and HardWire
for a single, fully functional library called Wire
that has a proper software and hardware I2C.
But, in the meantime, anyone doing Software I2C should use SoftWire
in the Wire
folder. And anyone using Hardware I2C should use WireSlave
in the WireSlave
folder, as I completely rewrote WireSlave
and the underlying core parts sometime back to make all of the hardware I2C function correctly. Though it should get this last PR to make sure a STOP gets generated on the Bus Error and Timeout cases that was missing in the original implementation.
Or I guess instead of renaming it HardWire
(even though that's more consistent), to keep with the "Arduino Naming Conventions", the existing half-baked Wire
files should be removed from the Wire
folder. And the WireSlave
files should be renamed to Wire
and moved to the Wire
folder (along with updating any #include
statements throughout the code and examples). That way Hardware I2C will be Wire
and Software I2C will be SoftWire
, which I think keeps with other Arduino frameworks.
I think the point of confusion is that WireSlave
was meant to completely replace Wire
. Originally in this framework, someone had started WireSlave
as a place to work on adding slave support (rather than modifying the existing Wire
library). When I picked it up and finished up the slave support, I found it was much better to keep the master and slave logic together in one place (as they needed to interact with each other and were not independent). So, I created an improved master in WireSlave
to go with the slave support, intending for the old Wire
library at some point in the future to be replaced by the new WireSlave
code.
Now, for the most part, the old Wire
code continued to work with the new low-level I2C logic and people continued to use it rather than switching to WireSlave
. But if anyone has issues with Wire
, they should use WireSlave
until someone gets the libraries swapped around.
So @david314, you can try this PR fix next week as you indicate, but I doubt it will do anything for Wire
(as @stevstrong indicated he already tested that). This PR had nothing to do with fixing the obsolete Wire
code. Instead, the report received was that a STOP condition wasn't being generated in certain cases by hardware I2C in general (and people were saying it was happening with both Wire
and WireSlave
). And indeed, I found two such cases (Bus Error and Timeout) in the low-level I2C logic and this PR fixes that. It isn't likely to make Wire
work if Wire
was behaving any differently than WireSlave
.
But WireSlave
seems to be working, both in your tests and my own, and that's the only code I'm interested in, since Wire
should be replaced with it anyway. However, if there are any issues with WireSlave
not working, let me know and I'll continue looking into it.
I managed to try the scanner with Wire_slave
lib.
It behaves a lot more stable than the native lib. But even in this case, the stop condition is not generated as it should.
After the address byte there is no STOP but a new START and only after that a STOP.
Those rising edges on the 10 SCL pulse are perfectly synchronized, there is no delay between them, so that it cannot be considered to be a STOP.
And here the plot: One can recognize:
@stevstrong -- I just reworked the PR to move the STOP condition from the _i2c_irq_error_handler
to the i2c_master_xfer
function so that it can wait for the I2C peripheral to clear the START/STOP/PEC flags in CR1. There was a note in the ST documentation on the I2C_CR1 register that said:
When the STOP, START or PEC bit is set, the software must not perform any write access to I2C_CR1 before this bit is cleared by hardware. Otherwise there is a risk of setting a second STOP, START or PEC request.
See if this fixes things now...
So I finally got time to check the proposed fix. The repeated start condition seems to disappear, the original version shows restarts when no ACK was presented on the bus by a slave, see the first screenshot. The proposed fix removes these restarts, see the second screenshot. Both versions show a pause of ~170µs for the device at 0x28 (a BNO055), which is due to clock stretching. The device at 0x76 (a BME280) does not use clock stretching and is also detected correctly. I also attached the original waves for your convenience.
Which lib did you try? Wire slave
or Wire
?
Because I see no difference in behavior between these plots, both show a re-start-stop instead of a clean stop condition.
I only used Wire_slave
and there is a difference between the two waves, there are no restarts on the latter one. If in doubt, get ScanaStudio and check the attached waves.
Despite that in the second one no re-start was recognized by the tool, I just looked to the two images and detected the same behavior: check the 10th SCL pulse, there is a falling SDA edge while SCL is high, which means (re)start without a stop.
@david314 -- was your test with this current PR (and commit fa6c360) included? I'm getting confused as to who is testing what, since I had several different iterations of this PR as I was working through the details and @stevstrong was also making changes on master trying to fix it in the Wire
library. My most recent fix (fa6c360) doesn't even exit the i2c_master_xfer
function until the STOP is sent -- and it checks the CR1 register status as per the ST Micro docs in order to not accidentally create a restart condition, so it should have fixed it for both Wire
and WireSlave
independently of @stevstrong's changes on master.
@dewhisna , this fix works, but is not complete.
I was further investigating my instability issue and I figured out that it was a HW stall due to interference caused by touching the SDA/SCL pins with the probe.
In these cases your fix does not help, because the transfer function returns with timeout already in line 379 before sending the start condition.
I managed to fix the issue one level higher, in Wire::process()
function by checking the return variable res
for any error, not only for I2C_ERROR_PROTOCOL
as currently implemented, so that the HW gets re-initialized also in case of timeout.
@stevstrong -- if it returned with a timeout already before sending the START, as you say, then why would you need to send a STOP?? The START and STOP should be paired. That busy is specifically if the I2C peripheral is already busy (i.e. the bus is being used by a secondary master on a multi-master system). In fact, that line 379 that you reference was more or less a cut-and-paste from the STM32Cube code for I2C, by the way... and is exactly how they did it. Not sure why you expect it to magically work if you are creating random noise on the I2C lines to where it thinks there's another master on the bus...
I did not say that a STOP should be sent by timeout. I was saying that the I2C_ERROR_TIMEOUT
return value of the transfer function is not correctly evaluated on a higher level and that the bus should be reset in this case, too, just like for the other error code.
Oh, yes, that is true. The Wire
library never had timeout processing. That was something I added in WireSlave
(since timeout was part of the formal Arduino API) and is yet another reason to use WireSlave
instead of Wire
. As I've said, I intended for someone to move WireSlave
to Wire
and replace the hardware Wire
functionality with a single cohesive code-base. I didn't do that on my original PR for I2C slave because the goal there was to get slave functionality working without disturbing what everyone else was currently doing with Wire
and master-only processing. But once WireSlave
got proven out (and it seems to be, as many people are using it), then the intent was to update Wire
and what everyone was using so that it's consistent and no longer two separate libraries.
Honestly, I am not very motivated to do such a major change (replace Wire with Wire slave) in the current repo. I started to have a good overview about what is going on in the current state. And I find Wire_slave more complicated than Wire.
Maybe Roger can give his opinion here.
I agree with Steve
I don't think its worthwhile making a major change.
Its probably impossible to make the IC2 work perfectly with all devices, and people always have the option to use the official core if the I2C in this core does not work with their particular device.
Generally with this core, support for people's existing code base (sketches) should probably be the priority, as again, new users can start using the official core and write their code to be compatible with that core
I'm perfectly OK with that decision of keeping both and is why I originally kept WireSlave
, that someone else had already started developing, separate. But I designed it in such a way that all users should be able to simply switch to using it and maintain compatibility. So, if someone is having difficulties with Wire
working, they should try WireSlave
(and vice versa).
Bty, the Wire_slave also does not handle correctly the timeout error. This if
case
https://github.com/rogerclarkmelbourne/Arduino_STM32/blob/d60ff19ace159cd332d9654e3eaa091c42623442/STM32F1/libraries/WireSlave/src/Wire_slave.cpp#L185-L188
will not be triggered because sel_hard->state
is not set to signalize error for timeout within this sequence at the beginning of i2c_master_xfer
:
https://github.com/rogerclarkmelbourne/Arduino_STM32/blob/d60ff19ace159cd332d9654e3eaa091c42623442/STM32F1/cores/maple/libmaple/i2c.c#L375-L381
That call to i2c_master_enable
isn't supposed to be triggered from that timeout. That BUSY
flag isn't specifically the status of this I2C. It's detection of whether the SDA/SCL lines are being driven externally by a secondary master in a multi-master configuration. If a second master is transmitting, you wouldn't want to be jamming signals on the lines to drive a reset condition. You'd want your app to simply wait for the secondary master. So it returns with a timeout condition and your app can decide what it wants to do. (See the code for i2c_master_enable
here to understand how that reset works).
I'm not 100% certain that this fixes #799 or not (since my lab is still in shambles) but these are definitely two cases missing a necessary STOP condition on a bus master.
Please test with your setup for #799. If it doesn't fix it, you can reopen the issue and I will continue my probing to try and resolve it.