pasko-zh / brzo_i2c

Brzo I2C is a fast I2C Implementation written in Assembly for the esp8266
GNU General Public License v3.0
244 stars 47 forks source link

I2C stucks after clock stretching time exceeds #24

Closed valkuc closed 6 years ago

valkuc commented 6 years ago

I got an issue that make I2C bus unusable after receiving code 8 from brzo_i2c_end_transaction. The SCL/SDA line remains low and makes not possible to use bus until reinit. How to reproduce: 1) init i2c with clock stretching: brzo_i2c_setup(3000) 2) start a write operation to i2c slave 3) make slave to not answer in time (I'm doing that by putting my slave in debug pause) 4) first received error code will be 8 (Clock Stretching by slave exceeded maximum clock stretching time) 5) all subsequent calls to i2c write will return error code 1 (Bus not free, i.e. either SDA or SCL is low)

pasko-zh commented 6 years ago

Some understanding questions from my side:

a) The behaviour of 4. is OK, i.e. SCL is still low and time out exceeded. Then after that: Just to check, when you use a scope, is SCL still low, right? I assume a "yes" here.

b) When does your slave continue with i2c communication, i.e. "finish" the clock stretching? Because as long as a slave pulls SCL low, the returnded error code 1 would be correct.

Depending on your answer of b) : In a case, where the i2c bus is stalled, one could try a "de-stalling" or "recovery"-sequence, like described here. After such a sequence, it is expected that "In many cases this will advance the state machine of any blocked slave to a point where it accepts the next start condition again.".

valkuc commented 6 years ago

a) yes b) assume that there is no communication with slave at all, we just got clock-stretching timeout (because slave is in debug pause)

I understand that brzo_i2c behavior is correct here, but it looks like we lack functionality to reset bus to original state as described at link you pointing:

One rather clumsy but easy to implement solution is to toggle the clock line multiple (16) times before doing any I2C operation after power-up of the micro controller i.e. after it has possibly gone through reset. This sequence can be followed by a stop condition.

I guess brzo_i2c_reset_bus was conceived for this, but it's only stub.

pasko-zh commented 6 years ago

OK. I did not (yet) implement brzo_i2c_reset_bus because there were no requests for it... but since there is one now :grinning:

These kind of "restets" may help or may not, it depends on the slave not releasing the bus. The proper way to do is to reset all slaves on the bus, i.e. either power off the slaves or use their reset pin.

In your case, would your slave release the bus after n times toggeling either SDA or SCL?

valkuc commented 6 years ago

Actually there are no issues with slaves - they just work. My workaround for now is to restart master (which in turn reinitializes i2c) and after that all work fine until next "clock-stretch timeout". In my case it's often occurs when I connecting to my slave in debug mode (the slave is STM32 project) with ST-Link.

pasko-zh commented 6 years ago

Could you clarify your last comment a bit, I mean when you say " there are no issues with slaves - they just work"? So, you're setup is: One esp8266 as an i2c master using brzo_i2c and one STM32 as an i2c slave?

valkuc commented 6 years ago

yes, right

pasko-zh commented 6 years ago

Aha. So, if the STM32 is stalling the bus by pulling SCL low, how could the master resolve this? Will the STM32 release SCL, when the master sends a "recovery"-sequence?

You wrote "My workaround for now is to restart master (which in turn reinitializes i2c)", this means you reset the esp8266?

valkuc commented 6 years ago

yes :) but I'm sure this can be avoided just be reiniting I2C pins

pasko-zh commented 6 years ago

Aha, we are getting closer ;-)

So, we have to think what makes the STM release SCL? And/or what happens when the esp is turned off... OK, the pin is configured as open drain with pull up and STM pulls low, then by switching the esp off, there is no more vcc and hmmm....

valkuc commented 6 years ago

No :))) you are going too deep. The case is simple: imagine that there is not i2c slave at all - the ESP i2c will stuck same. Root cause is that i2c line do not restore back at "working" state in case of clock-stretch timeout.

pasko-zh commented 6 years ago

I meant by this, that one could try to switch pins from "output open drain" to the initial state like after booting up. Probably this could help.

You might give it a try? I.e. you would need to use those PIN_FUNC_SELECT(...) statements.

valkuc commented 6 years ago

yes, yes, I know that I can do that way, but in my opinion library should do it by itself, i.e. we should not leave I2C bus in undefined state in case of error (or at least in case of "clock-stretch" timeout). Or at least give user easy way to reinit bus in working state back.

pasko-zh commented 6 years ago

Sorry, I was too short in my previous comment. I meant: Would you mind giving it a try? Because I cannot test it, since I cannot reproduce your setup here. And if it is a success, I will of course add it to the library :smile: So, after you get a bus stall, after 5. in your first comment, could you then try to set GPIO Pins 12 and 13 to INPUT? Does this resolve the stall?

valkuc commented 6 years ago

Sure I give it a try, I just would like that you do it in assembly (because I can't). So, let's do that way - I will write in C, the you will make a "ASM" bomb from that :)

valkuc commented 6 years ago

You probably will laugh :) but looks like you code already have routine to reset bus. Check my pull request. I moved last lines of brzo_i2c_setup (where pins configured in assembly) into brzo_i2c_reset_bus. Now in my routines I check for brzo_i2c_end_transaction return code, and if it 8 then I call brzo_i2c_reset_bus.

pasko-zh commented 6 years ago

@valkuc sorry for my late reply but my other life keeps me rather busy these days :-/

I had a look at your PR. And I was thinking again, where the root caucse could be and I guess, I discovered it :v:

Checking for clock stretching is implemented the same way for brzo_i2c_write and brzo_i2c_read, so I stick with the former for my explanation here.

Checking starts here after line 135. Then at line 140, we let SCL raise but leave SDA unchanged. We reach line 167 when clock stretch timout was exceeded and will jump to the exit label l_exit . Now, SCL ist still low, because the slave still pulls it down, although the master has released SCL. Eventually, when the slave releases it, SCL will raise again. But, what happens with SDA? Well, as shown above, the master does not change SDA before checking for clock stretching—that is fine and according to i2c spec. But, if the last bit was 0, SDA remains low even after the timeout was exceeded.

Thus, we might end up with the case, where the slave has finally released SCL after the timeout but SDA is still low. If you then try to use brzo_i2c_write or brzo_i2c_read, they will complain with bus not free, since SDA is (still) low.

And that's why your bus reset code works, because it sets SCL and SDA to open drain and to high.

So my conclusion so far is that after you get a stall you could also call brzo_i2c_setup again and it will resolve it. Could you please check that?

If this is correct, then I will add additional statements to brzo_i2c_write and brzo_i2c_read that after time out was exceeded and SDA is still low, to set it to high again.

valkuc commented 6 years ago

So my conclusion so far is that after you get a stall you could also call brzo_i2c_setup again and it will resolve it. Could you please check that?

Actually I'm doing exactly what you asking to check but without reconfiguring pins again (that is done in setupmethod). Instead I'm only call code related to resetting back SDA and SCL pins to initial state. That what my pull request is exactly doing. I moved part of code from setup method to resetmethod. Check the diff please.

pasko-zh commented 6 years ago

OK, I just wanted to make sure.

I prefer adding setting SDA to high in the case of an exceeded timeout in brzo_i2c_write/read than adding it to brzo_i2c_reset_bus. Because what people usually are referring to as "de-stalling or recovering or resetting" the bus, is the toggeling of SCL which we discussed above.

valkuc commented 6 years ago

Well, if

setting SDA to high in the case of an exceeded timeout

will be done automatically it's even better than manually checking for clock stretch timeout error and calling reset bus method.

mcr-ksh commented 6 years ago

this is an excellent write up. any progress on this hence it was in Jan already?

pasko-zh commented 6 years ago

I plan to fix it this week and/or the following weekend. It shouldn't be a big thing.

pasko-zh commented 6 years ago

I've analyzed the issue again and made changes in brzo_i2c_write, see here.

@valkuc Could you please test the pre_release branch with your i2c slave against this issue? Note: After you experienced an exceeding of the clock stretching, you may need to wait a bit until your slave eventually releases SCL, before calling any brzo_i2c read or writes again.

valkuc commented 6 years ago

@pasko-zh , sorry for long delay. I have checked pre_release branch and can confirm - it's working. Bus not freezing anymore. Thanks for efforts!

pasko-zh commented 6 years ago

@valkuc thanks a lot đź‘Ť I will merge it then.

pasko-zh commented 6 years ago

Merged and new Release published. Thanks again for your support!!