jvoermans / Vibration_Logger

Logger to measure sea ice vibrations
3 stars 1 forks source link

I2C hanging #30

Closed jvoermans closed 3 years ago

jvoermans commented 3 years ago

@jerabaul29 We probs need a safety net in case an I2C sensor has a defect and causes the arduino to hang. It is something in the Wire library, but there hasn't really been a solid solution to this: https://github.com/arduino/Arduino/issues/1476

It doesn't always occur. If the complete sensor is gone, or the data and/or clock line are gone it just simply cannot find the sensor. The problem actually occurs when the VCC and/or GND lines are disconnected. In general, this is unlikely to happen. If the sensor is broken, it is most likely that the sensor won't respond at all, so that is not an issue. If the sensor is ripped off completely, that is also not an issue as all lines will be disconnected. Nevertheless, sounds a bit uncomfortable to rely on simply 'unlikely to happen'.

So I guess there are two approaches to prevent the arduino from potential freezing: 1) initiate regularly if the sensor is present. If not present, skip this sensor and go to the next. If a sensor fails while measuring and freezes the arduino, this is likely to happen only once and the watchdog can just reset the whole thing. 2) some kind of 'sub-watchdog' which, rather than resetting the arduino, cancels the task of the sensor reading and continues the loop.

Regarding the first, the following library works quite well to identify if a sensor is present, however, problem remains when disconnecting only GND and/or VCC. It is an alternative library to the Wire library, but can be cut down to just the function that is needed to check if sensor is present (like an I2C scanner). https://github.com/DSSCircuits/I2C-Master-Library

I initially though the watchdog could be used to 'break' a loop rather than resetting the arduino, but apparently this is not possible (break needs to be within a loop and watchdog works parallel not in series as far as I understand?).

Do you have any suggestions? Also, what I2C clock speed are we using? I think it is best to drop it down to say 50-100kHz or even lower if possible. I2C doesn't like long cables, but general advice is that it can still work on longer cables when clock speed is reduced. I tried the Multispeed I2C Scanner with the temperature sensor and it is able to go over a 5m long cat6 cable at 400kHz without problems, but perhaps better to drop it down as much as possible to prevent problems with signal glitches and thus potential freezing....

jerabaul29 commented 3 years ago

Have you actually encountered this I2C problem in practice? (just being curious :) ).

It seems that there is an easy fix: the issue you link to has actually been closed in favor of issue https://github.com/arduino/ArduinoCore-avr/issues/42 which implemented a fix. For backwards compatibility reasons or something like this it is not enabled by default I think (see discussion https://github.com/arduino/reference-en/issues/895 ). I will add it to the program to "robustify" things :) .

I will change the I2C frequency then, to be on the safe side :) .

jerabaul29 commented 3 years ago

I have just added both I2C timeout and reduced the I2C clock frequency:

https://github.com/jvoermans/Vibration_Logger/blob/725cdf895c06ff9a6a51708136b035fc7034a4c0/material_Jean/Due_SD_high_frequency_logger/src/main.cpp#L77-L78

which are set:

https://github.com/jvoermans/Vibration_Logger/blob/725cdf895c06ff9a6a51708136b035fc7034a4c0/material_Jean/Due_SD_high_frequency_logger/src/params.h#L14-L15

https://github.com/jvoermans/Vibration_Logger/blob/725cdf895c06ff9a6a51708136b035fc7034a4c0/material_Jean/Due_SD_high_frequency_logger/src/params.h#L15

I think you may try 50kHz as it is now, maybe try down to 25kHz if you want, and check if it works :) .

Let me know if you have any problems.

jvoermans commented 3 years ago

Amazing. I've spend 3 days looking for a solution. I was almost going to sketch a weird watchdog kinda way to circumvent this. Anyway, thanks for looking up.

I tested one I2C temperature probe over a 5m long cat6 cable, it still has issues. Instead I copy-pasted this Wire library: https://github.com/arduino/ArduinoCore-avr/tree/master/libraries/Wire

and used the function: Wire.setWireTimeout(i2c_timeout_micro_seconds , false)

That seems to work. You are right, it is unlikely to happen. But I'm happy you found a solution, as it could be disastrous when it happens... Once I have a completely build Geo prototype, I'll test again, but for now I think this is good :)

jvoermans commented 3 years ago

UPDATE: Ok interesting, I tested now with three temperature probes. When I decouple one temperature probe partially (disconnect GND and VCC together), it doesn't hang now (which is great) but it does block all other probes from transmitting values. To prevent that from happening I have to reset the multiplexer:

if (Wire.getWireTimeoutFlag()==1) { digitalWrite(5, LOW); Wire.clearWireTimeoutFlag(); digitalWrite(5, HIGH); }

I attached reset pin to digital 5; getWireTimeoutFlag=1 when there is a timeout. Not sure how this is going to function in the Geo sketch though?

jerabaul29 commented 3 years ago

That sounds good! :) Happy that at least some of the problems are solved.

I am a bit confused, just to make things clear:

jvoermans commented 3 years ago

1) I ran a multi speed scanner sketch, it checks response at 50 - 400 kHz at different intervals. It connects over a 5.5m ethernet cable fine on all speeds. Best to just use 50 kHz.

2) I tried my own sketch, as I don't have enough material to attach everything right now. Got the few lines from here: https://github.com/arduino/reference-en/issues/895. I'll try to test early next week with the Due, then I'll be in the lab again!

jerabaul29 commented 3 years ago

Sounds good :) .

Ok, I will give it a bit of thinking also when I have time :) .

jerabaul29 commented 3 years ago

(I had a good look at the tmp sensors logics; I would believe that there is not reason for not being able to read some temperature sensors if one of them fails :) will look forward to hearing the results of your testing :) )

jvoermans commented 3 years ago

I tested the sketch with three temperature probes (no geophone). Disconnecting the clock and data line is fine, it just gives an extreme value: TMP,26.39,26.50,-891647819776.00, Once reconnected, it gives a normal value again.

However, disconnecting either GND or VCC gives errors. Based on serial monitor, it doesn't seem to hang, but it might slow down the writing. Parser gives an error regarding 'wrapping'. I added the data here: https://github.com/jvoermans/Vibration_Logger/tree/master/material_Jean/BinarySdDataParser/all_example_data/example_data_I2C_disconnect

Also, reconnecting GND or VCC doesn't reconnect. Anyway, disconnecting of GND or VCC alone is of course highly unlikely to happen...

jerabaul29 commented 3 years ago

I think it was "just" a problem that indeed it takes a bit more time in this second case to ignore the sensors, and the ADC buffers need to be bigger to accomodate that. I just increased the Arduino buffer size - can you update the Arduino Due code you run, try again, and let me know if this fixes things? :)

jvoermans commented 3 years ago

Just added the new files. It is fast, but it produces a lot of zeros (in serial monitor at least) now so there seems to be something wrong? https://github.com/jvoermans/Vibration_Logger/tree/master/material_Jean/BinarySdDataParser/all_example_data/example_data_I2C_disconnect2

jerabaul29 commented 3 years ago

I found a couple of bugs in the Due code; can you upgrade to the latest due code version, and try again? :) Sorry, a bit hard to find bugs when I do not have the sensors myself.

jvoermans commented 3 years ago

Thanks! Yes, not sure how you do it at all without the hardware ;)

jerabaul29 commented 3 years ago

It is actually a good exercise in software engineering ^^ .

jerabaul29 commented 3 years ago

Will be curious to know if it fixed things :)

jvoermans commented 3 years ago

@jerabaul29 Just uploaded and tested. See example data here: https://github.com/jvoermans/Vibration_Logger/tree/master/material_Jean/BinarySdDataParser/all_example_data/example_data_I2C_disconnect2

Zero's are gone, but seems to be a delay again when I disconnect the VCC line of one of the probes. Parser gives error for this file as well (either 6 or 7, it might be both). Files 1-3 are without disconnecting an I2C sensor.

jerabaul29 commented 3 years ago

@jvoermans a possible way to improve I2C range is to decrease frequency.

This was discussed a bit higher in this thread. Can you check if it helps to change:

https://github.com/jvoermans/Vibration_Logger/blob/725cdf895c06ff9a6a51708136b035fc7034a4c0/material_Jean/Due_SD_high_frequency_logger/src/params.h#L15

Can you try with the following values?

 constexpr unsigned long i2c_clock_frequency = 10000UL;  // I think the default is 100000UL; may need to test by hand which values work 
 constexpr unsigned long i2c_clock_frequency = 15000UL;  // I think the default is 100000UL; may need to test by hand which values work 
 constexpr unsigned long i2c_clock_frequency = 25000UL;  // I think the default is 100000UL; may need to test by hand which values work 

and see if it helps?

jerabaul29 commented 3 years ago

Regarding the hanging of the I2C: I searched for more but I did not find more information. From what I saw on your file, the logger just misses a few seconds of logging just when the sensor fails, and then the data are back. So I think it is still quite ok, in worst case we only loose a few seconds of data.

jvoermans commented 3 years ago

Agreed :)