Open wmarkow opened 6 years ago
I have found Megunolink/ArduinoCrashMonitor which uses watchdog feature to:
After some tries I have found two different places, where it freezes:
uint8_t twi_readFrom(uint8_t address, uint8_t* data, uint8_t length, uint8_t sendStop)
line 196:
while(TWI_MRX == twi_state){
continue;
}
uuint8_t twi_writeTo(uint8_t address, uint8_t* data, uint8_t length, uint8_t wait, uint8_t sendStop)
line 278:
while(wait && (TWI_MTX == twi_state)){
continue;
}
Interesting.
This seems to be a known issue
I'm using arduino hardware in version 1.6.17.
I just want to summarize some things, before I proceed with investigations:
It could be that all of the facts above make, that it is easy for me to reproduce the issue when the program freezes in Wire library. It may be a good entry point to investigate the Wire library and try to improve something.
As I have written before, I started to use ArduinoCrashMonitor. I'm using this library to:
Additionally I added a piece of code in the Arduino's loop, that will blink the internal diode. If the diode is blinking then it means that loop is being constantly executed and the program doesn't freeze. When the program freezes (for example somewhere in Wire library) the internal diode doesn't blink for about 4 seconds, then Arduino is restarted and program is back to life.
The plan is: download Adruino sources and try to modify internal Wire library. Some hints can be found:
With https://github.com/wmarkow/arduino-hifi/commit/703496c4c4d1856204b5e68c8a242d69ab5ecc94 commit (and a commit before) and standard Wire library and ArduinoCrashMonitor, I have noticed the following behavior:
while(TWI_MRX == twi_state){
codes in twi_readFrom or twi_writeTo methods.With https://github.com/wmarkow/Arduino/commit/4028bdcf4b16a166b09bf1525baee974ccfffd16 commit of my Arduino's fork I have modified the Wire library so it have a timeouts in twi_readFrom or twi_writeTo methods where it checks the twi_state variable.
This is only a partial fix and not all of the while loops have been modified. As a default the timeouts are disabled, so it will behave in the same as original Wire library.
To enable the timeout you need to call a proper method in your Arduino setup():
Wire.setTimeoutInMillis(50);
this will set timeout to 50 milliseconds (but of course you can set a different timeout value).
With my modified Wire library and ArduinoCrashMonitor, I have noticed be following behavior:
Sometimes the the code freezes, the internal diode stops blinking and program doesn't respond to user actions (like changing volume, LCD is not updated). Probably this is a result of not all while loops being modified. After a few seconds Arduino gets restarted by watchdog, but this time no crash has been reported by ArduinoCrashMonitor. Why? I started to take a look again at the Wire library code and I have found a while loop in void twi_stop(void). This method is only called from TWI interrupt, so it probably block in the interrupt. If TWI interrupt is being executed the watchdog interrupt will wait but will never be executed (because TWI is in infinite loop). This results in the fact that ArduinoCrashMonitor will not be able report any crash. Fortunatelly the AVR watchdog will finally reset Arduino in that case as well.
Next part of my investigation is a loop in void twi_stop(void) method.
With https://github.com/wmarkow/Arduino/commit/afc25a64316e38f29cf887f546b8a09c73375838 commit of my Arduino's fork I have modified the Wire library so it now has a timeout in void twi_stop(void). Since this method is executed from interrupt service routine I couldn't use millis() because it will not work correctly (it may give false time results when it is used in ISR). I used some counter that counts how many times the while loops is executed; when some limit is reached the method timeouts.
With this solution I have never observed a freeze (however I didn't modified all while loops yet). The LED still blinks (but blinks slower, so the timouts are working), so the Arduino's loop is being constantly executed.
Good thing that Arduino not freezes but my program is still not responsive to user: probably because there is something wrong with I2C and I can not communicate with my devices by I2C: LCD is not updated, volume doesn't work, and so on. Previously at least watchdog restarted Arduino and everything was fine :) When I press Arduino reset button, then everything starts working correctly.
Maybe I'm missing something and to recover from that state I need to do something else (like setting some specific TWI register)?
With https://github.com/arduino/Arduino/commit/b325055fda2d3b22c48ed6a316a1efa6f2f10a08 commit I have managed to recover TWI interface (at least it works in my project).
hello, i just notice your work. Instead of millis() you can use micros(), that will be updated even in a ISR state, as is done by HW.
Rember that when the system freeze, the best solution in reset the HW i2c and restart teh operation.
I suggest using the TWI library directly (is what wire uses under the hood) has it will give you control over what you need
@MauroMombelli, thank your your comment. Actually in my work I have used your pull request mixed with DanNixon's pull request. Maybe I have reinvented the wheel again a bit, but by the way I learned new things.
I will take a look at using micros()
, as you mentioned.
By using TWI library directly you mean to use what is in Wire\src\utility\ directory (twi.c and twi.h)?
the way DanNixon uses the twi_guard() function is really elegant. The advantages of micros() would be to make the timeout idipendenet from the clock source; it may avoid future issues
and yes, that is the twi library, there you see how I2C work at register level, and is quite easy.
I agree that using millis()
or micros()
will make the timeout independed from the clock source but I'm not so convinced that using micros()
in interrupt service routine will help.
Calling millis()
in ISR will mostly return always the same value.
What will happen if we use micros()
in ISR? We need to take a look in the Arduino source code, where micros()
will return something like:
return ((m << 8) + t) * (64 / clockCyclesPerMicrosecond());
m
variable will mostly have the same value, it can be incremented by 1 in some specific case
t
variable is derived from TCNT0 (this is a hardware register) and it varies from 0 to 255
clockCyclesPerMicrosecond()
on Arduino Uno is 16
.
When you call micros()
in ISR the value will change but not so much, after some time (when t
will overflow and starts counting from zero) it will start to return the same values.
We can even calculate that tiem by putting the values into the equation above:
((1 << 8) + 255) * (64 / 16)
which becomes
(256 + 255) * 4
which is 2044 us (2ms).
I have the feeling that using micros()
in ISR may work (on Arduino Uno) at least for 2044us. Other words, when you put a timeout more that 2044 us then it will still hang in twi_stop()
method.
After two years I come back to the issue. In the meantime I have installed a new RF antenna so my RDA5870 chip gets a better reception and the issue is now harder to reproduce (I have tried to reproduce it with the same USB wall charger).
But I have found a very easy way to reproduce it:
I did further tests with the latest fix from https://github.com/arduino/ArduinoCore-avr/pull/107 I have tested the sources based on the https://github.com/wmarkow/arduino-hifi/commit/427d772ba56e911094ae350489df963dd0851ae6 commit.
In my coe I'm checking for the TWI timeout flag by calling Wire.getWireTimeoutFlag()
and reset I2C Display at that moment; the display blinks then, so I can see that the timeout occured.
When I:
Wire.setWireTimeoutUs(uint32_t, boolean)
at all then I can reproduce the freezingWire.setWireTimeoutUs(1500ul, true)
then the timeout occurs constantly and the LCD blinks all the time. I'm assuming that 1500us is to low for my I2C devices so at least one of them doesn't respond so fast. The device doesn't freeze so the fix for the issue works.Wire.setWireTimeoutUs(2000ul, true)
then I do not have any spontanious timeouts reported. To get the timeout I need to perform my test routine described in my previous comment. The device doesn't freeze so the fix for the issue works.In my solution I put a 10000us as a timeout value. The fix helps me to recover from the endless loop.
I have noticed that since some time (when I started to use RDS) application hangs. It is not responsive: the LCD screen freezes and do not displays new RDS messages or do not displays current volume. Moreover, it is not possible to change volume; nothing works. After power reset it works fine, until it hangs again.
This issue is highly reproducible when I power up the system from USB wall charger by Chicony Power Technology, AC Adapter, Model W12-010N3B. When I power up the system from my laptop's USB it hangs very rare. So it has something to do with the power supply.
However it would be good to know at what line of the source code it hangs. It would be a good idea to use watchdog to reset the board.