wmarkow / arduino-hifi

A small HiFi system built with Arduino
MIT License
0 stars 1 forks source link

Application hangs and is not responsive to user's actions #3

Open wmarkow opened 6 years ago

wmarkow commented 6 years ago

I have noticed that since some time (when I started to use RDS) application hangs. It is not responsive: the LCD screen freezes and do not displays new RDS messages or do not displays current volume. Moreover, it is not possible to change volume; nothing works. After power reset it works fine, until it hangs again.

This issue is highly reproducible when I power up the system from USB wall charger by Chicony Power Technology, AC Adapter, Model W12-010N3B. When I power up the system from my laptop's USB it hangs very rare. So it has something to do with the power supply.

However it would be good to know at what line of the source code it hangs. It would be a good idea to use watchdog to reset the board.

wmarkow commented 6 years ago

I have found Megunolink/ArduinoCrashMonitor which uses watchdog feature to:

After some tries I have found two different places, where it freezes:

Interesting.

wmarkow commented 6 years ago

This seems to be a known issue

wmarkow commented 6 years ago

I'm using arduino hardware in version 1.6.17.

wmarkow commented 6 years ago

I just want to summarize some things, before I proceed with investigations:

It could be that all of the facts above make, that it is easy for me to reproduce the issue when the program freezes in Wire library. It may be a good entry point to investigate the Wire library and try to improve something.

wmarkow commented 6 years ago

As I have written before, I started to use ArduinoCrashMonitor. I'm using this library to:

Additionally I added a piece of code in the Arduino's loop, that will blink the internal diode. If the diode is blinking then it means that loop is being constantly executed and the program doesn't freeze. When the program freezes (for example somewhere in Wire library) the internal diode doesn't blink for about 4 seconds, then Arduino is restarted and program is back to life.

The plan is: download Adruino sources and try to modify internal Wire library. Some hints can be found:

wmarkow commented 6 years ago

With https://github.com/wmarkow/arduino-hifi/commit/703496c4c4d1856204b5e68c8a242d69ab5ecc94 commit (and a commit before) and standard Wire library and ArduinoCrashMonitor, I have noticed the following behavior:

wmarkow commented 6 years ago

With https://github.com/wmarkow/Arduino/commit/4028bdcf4b16a166b09bf1525baee974ccfffd16 commit of my Arduino's fork I have modified the Wire library so it have a timeouts in twi_readFrom or twi_writeTo methods where it checks the twi_state variable. This is only a partial fix and not all of the while loops have been modified. As a default the timeouts are disabled, so it will behave in the same as original Wire library. To enable the timeout you need to call a proper method in your Arduino setup(): Wire.setTimeoutInMillis(50); this will set timeout to 50 milliseconds (but of course you can set a different timeout value).

With my modified Wire library and ArduinoCrashMonitor, I have noticed be following behavior:

Sometimes the the code freezes, the internal diode stops blinking and program doesn't respond to user actions (like changing volume, LCD is not updated). Probably this is a result of not all while loops being modified. After a few seconds Arduino gets restarted by watchdog, but this time no crash has been reported by ArduinoCrashMonitor. Why? I started to take a look again at the Wire library code and I have found a while loop in void twi_stop(void). This method is only called from TWI interrupt, so it probably block in the interrupt. If TWI interrupt is being executed the watchdog interrupt will wait but will never be executed (because TWI is in infinite loop). This results in the fact that ArduinoCrashMonitor will not be able report any crash. Fortunatelly the AVR watchdog will finally reset Arduino in that case as well.

Next part of my investigation is a loop in void twi_stop(void) method.

wmarkow commented 6 years ago

With https://github.com/wmarkow/Arduino/commit/afc25a64316e38f29cf887f546b8a09c73375838 commit of my Arduino's fork I have modified the Wire library so it now has a timeout in void twi_stop(void). Since this method is executed from interrupt service routine I couldn't use millis() because it will not work correctly (it may give false time results when it is used in ISR). I used some counter that counts how many times the while loops is executed; when some limit is reached the method timeouts.

With this solution I have never observed a freeze (however I didn't modified all while loops yet). The LED still blinks (but blinks slower, so the timouts are working), so the Arduino's loop is being constantly executed.

Good thing that Arduino not freezes but my program is still not responsive to user: probably because there is something wrong with I2C and I can not communicate with my devices by I2C: LCD is not updated, volume doesn't work, and so on. Previously at least watchdog restarted Arduino and everything was fine :) When I press Arduino reset button, then everything starts working correctly.

Maybe I'm missing something and to recover from that state I need to do something else (like setting some specific TWI register)?

wmarkow commented 6 years ago

With https://github.com/arduino/Arduino/commit/b325055fda2d3b22c48ed6a316a1efa6f2f10a08 commit I have managed to recover TWI interface (at least it works in my project).

MauroMombelli commented 6 years ago

hello, i just notice your work. Instead of millis() you can use micros(), that will be updated even in a ISR state, as is done by HW.

Rember that when the system freeze, the best solution in reset the HW i2c and restart teh operation.

I suggest using the TWI library directly (is what wire uses under the hood) has it will give you control over what you need

wmarkow commented 6 years ago

@MauroMombelli, thank your your comment. Actually in my work I have used your pull request mixed with DanNixon's pull request. Maybe I have reinvented the wheel again a bit, but by the way I learned new things. I will take a look at using micros(), as you mentioned.

By using TWI library directly you mean to use what is in Wire\src\utility\ directory (twi.c and twi.h)?

MauroMombelli commented 6 years ago

the way DanNixon uses the twi_guard() function is really elegant. The advantages of micros() would be to make the timeout idipendenet from the clock source; it may avoid future issues

and yes, that is the twi library, there you see how I2C work at register level, and is quite easy.

wmarkow commented 6 years ago

I agree that using millis() or micros() will make the timeout independed from the clock source but I'm not so convinced that using micros() in interrupt service routine will help.

Calling millis() in ISR will mostly return always the same value.

What will happen if we use micros() in ISR? We need to take a look in the Arduino source code, where micros() will return something like:

return ((m << 8) + t) * (64 / clockCyclesPerMicrosecond());

m variable will mostly have the same value, it can be incremented by 1 in some specific case t variable is derived from TCNT0 (this is a hardware register) and it varies from 0 to 255 clockCyclesPerMicrosecond() on Arduino Uno is 16. When you call micros() in ISR the value will change but not so much, after some time (when t will overflow and starts counting from zero) it will start to return the same values.

We can even calculate that tiem by putting the values into the equation above:

 ((1 << 8) + 255) * (64 / 16)

which becomes

(256 + 255) * 4

which is 2044 us (2ms).

I have the feeling that using micros() in ISR may work (on Arduino Uno) at least for 2044us. Other words, when you put a timeout more that 2044 us then it will still hang in twi_stop() method.

wmarkow commented 4 years ago

After two years I come back to the issue. In the meantime I have installed a new RF antenna so my RDA5870 chip gets a better reception and the issue is now harder to reproduce (I have tried to reproduce it with the same USB wall charger).

But I have found a very easy way to reproduce it:

  1. open the case so there is an easy acces to RDA5870 chip
  2. make sure that the device is powered up (power source is here not relevant, it may be usb wall charger or laptop)
  3. pull out the power voltage cable from the RDA5870 board
  4. check if the device reacts on the volume adjusting, if it still reacting put in the power voltage cable to RDA5870 board and go back to step 3
  5. when the device doesn't react on user input then the issue is reproduced, after 4-6 seconds the ArduinoCrashMonitor library will reset the board
wmarkow commented 4 years ago

I did further tests with the latest fix from https://github.com/arduino/ArduinoCore-avr/pull/107 I have tested the sources based on the https://github.com/wmarkow/arduino-hifi/commit/427d772ba56e911094ae350489df963dd0851ae6 commit.

In my coe I'm checking for the TWI timeout flag by calling Wire.getWireTimeoutFlag() and reset I2C Display at that moment; the display blinks then, so I can see that the timeout occured.

When I:

In my solution I put a 10000us as a timeout value. The fix helps me to recover from the endless loop.