openaps / oref0

oref0: The open reference implementation of the OpenAPS reference design.
http://www.OpenAPS.org
MIT License
431 stars 394 forks source link

rig hangs requiring hard reboot #809

Closed tepidjuice closed 5 years ago

tepidjuice commented 6 years ago

Sorry if this is not the best way to deal with this problem.

A few times a week my rig drops its bt-tether to my phone and will only reconnect after a hard reboot of the rig. I think this happens when my phone has poor internet connectivity.

I'd like to suggest a change to oref0-online. Either: 1) oref0-online stays connected to the bt-tether regardless of the internet connectivity of the tether.

2) if no wifi or bt-connection has occurred in the last 10-15 minutes the rig turns the bt-antenna off and on automatically.

Hopefully one of these will get the rig reconnected to the phone without a hard reboot. As this problem occurs when I'ma way from a computer+USB cable to connect by serial.

scottleibrand commented 6 years ago

oref0-online should already leave the bt-pan connection established, regardless of the state of its Internet connection, as long as it doesn't have connectivity via wifi. Once a wifi Internet connection is established, we want the bt-pan connection to drop to eliminate the blue tethering bar on the phone.

Are you running the latest dev? There was a fix merged a day or two ago for one particular situation that matches the (admittedly vague) symptoms you're describing.

If you're still seeing the issue, it'd be worth checking the network.log and syslog to see what the rig is doing when it can't reconnect and requires a hard reboot. If it's crashing, the syslog from just before the crash is often informative.

tepidjuice commented 6 years ago

This happened again today. I waited until I could connect over serial and the rig had definitely crashed. There is nothing obvious in syslog nor messages to indicate what happened.

This seems to not necessarily be a bt issue but a crash. I'm happy to close this and debug this issue somewhere else.

scottleibrand commented 6 years ago

I just retitled the issue, so we can keep this open to note any thing we learn, or for others experiencing the same issue to find and collaborate on.

Bender1061 commented 6 years ago

I have this happening every so often. Only I am not able to connect even over serial with SSH. Even a hard reboot won't always fix it right away. I found letting it sit for a few mins then powering it back up works. So on that it sounds like a heat issue. but the rig is warm but normally not too hot to the touch. I'm going to be re-flashing this weekend on a couple rigs and start my own notes about what rigs are doing this to me. I don't know if its related or not, but it seems to have started since we put the offline stuff in. So I agree If anyone sees similar type issues this would be the place to put it. Maybe we can start figuring out from everyone experience what's actually going on.

Bender1061 commented 6 years ago

Ok, so my rig hung again today, did it last thursday as well. my last pump-loop started at 9:55 in the AM, so it was partly though my loop when it just stopped. Rig was on, but would not respond to serial .ssh nor showed up on my network. (used fing to see my network) it was on for several hours before I realized it and shut it down. Here is the some logs

syslog

Nov 24 09:54:51 localhost Version: subg_rfspy 0.8
Nov 24 09:55:01 localhost CRON[16394]: (root) CMD (cd /root/myopenaps && ps aux | grep -v grep | grep -q 'oref0-ns-loop' || oref0-ns-loop | tee -a /var/log/openaps/ns-loop.log)
Nov 24 09:55:01 localhost CRON[16392]: (root) CMD (cd /root/myopenaps && ps aux | grep -v grep | grep -q 'oref0-autosens-loop' || oref0-autosens-loop 2>&1 | tee -a /var/log/openaps/autosens-loop.log)
Nov 24 09:55:01 localhost CRON[16391]: (root) CMD (ps aux | grep -v grep | grep -q "oref0-bluetoothup" || oref0-bluetoothup >> /var/log/openaps/network.log)
Nov 24 09:55:01 localhost CRON[16398]: (root) CMD (cd /root/myopenaps && oref0-pushover a2ugephuya73w83ig8oax63jhxomzj ucbh5ahbw64riiszc9np7co62amw7i 2>&1 >> /var/log/openaps/pushover.log)
Nov 24 09:55:01 localhost CRON[16402]: (root) CMD (find /var/log/openaps/pump-loop.log -mmin +5 | grep pump && ( killall -g --older-than 5m openaps; killall -g --older-than 5m oref0-pump-loop; killall -g --older-than 5m openaps-report ))
Nov 24 09:55:01 localhost CRON[16412]: (root) CMD (ps aux | grep -v grep | grep -q "oref0-online 24:F0:94:19:D2:0D" || cd /root/myopenaps && oref0-online 24:F0:94:19:D2:0D 2>&1 >> /var/log/openaps/network.log)
Nov 24 09:55:01 localhost CRON[16406]: (root) CMD (( killall -g --older-than 30m openaps; killall -g --older-than 30m oref0-pump-loop; killall -g --older-than 30m openaps-report ))
Nov 24 09:55:01 localhost CRON[16414]: (root) CMD (sudo wpa_cli scan)
Nov 24 09:55:01 localhost CRON[16393]: (root) CMD (oref0-radio-reboot)
Nov 24 09:55:01 localhost CRON[16410]: (root) CMD (cd /root/myopenaps && ( ps aux | grep -v grep | grep -q 'peb-urchin-status 00:18:33:75:0F:66' || peb-urchin-status 00:18:33:75:0F:66 ) 2>&1 | tee -a /var/log/openaps/urchin-loop.log)
Nov 24 09:55:01 localhost CRON[16411]: (root) CMD (cd /root/myopenaps && sleep 30 && ( ps aux | grep -v grep | grep bash | grep -q 'bin/oref0-pump-loop' || oref0-pump-loop ) 2>&1 | tee -a /var/log/openaps/pump-loop.log)
Nov 24 09:55:01 localhost CRON[16418]: (root) CMD (cd /root/myopenaps && ( ps aux | grep -v grep | grep bash | grep -q 'bin/oref0-pump-loop' || oref0-pump-loop ) 2>&1 | tee -a /var/log/openaps/pump-loop.log)
Nov 24 09:55:02 localhost CRON[16389]: (CRON) info (No MTA installed, discarding output)
Nov 24 09:55:02 localhost CRON[16388]: (CRON) info (No MTA installed, discarding output)
Nov 24 09:55:02 localhost CRON[16385]: (CRON) info (No MTA installed, discarding output)
Nov 24 09:55:07 localhost CRON[15086]: (CRON) info (No MTA installed, discarding output)
Nov 24 09:55:16 localhost root: Uploaded upload/ns-status.json to https://bender1061.herokuapp.com
Nov 24 09:55:23 localhost root: Uploaded upload/latest-treatments.json to https://bender1061.herokuapp.com
Nov 24 09:55:23 localhost CRON[16386]: (CRON) info (No MTA installed, discarding output)
Nov 24 09:55:32 localhost libmraa[16699]: libmraa version v1.7.0 initialised by user 'root' with EUID 0
Nov 24 09:55:32 localhost libmraa[16699]: gpio214: init: Failed to write to 'export': No such device
Nov 24 09:55:32 localhost libmraa[16699]: edison: Arduino board not detected, assuming Intel Edison Miniboard
Nov 24 09:55:32 localhost libmraa[16699]: libmraa initialised for platform 'Intel Edison' of type 2
Nov 24 09:55:34 localhost libmraa[16707]: libmraa version v1.7.0 initialised by user 'root' with EUID 0
Nov 24 09:55:34 localhost libmraa[16707]: gpio214: init: Failed to write to 'export': No such device
Nov 24 09:55:34 localhost libmraa[16707]: edison: Arduino board not detected, assuming Intel Edison Miniboard
Nov 24 09:55:34 localhost libmraa[16707]: libmraa initialised for platform 'Intel Edison' of type 2
Nov 24 09:55:41 localhost CRON[16381]: (CRON) info (No MTA installed, discarding output)
Nov 24 09:55:41 localhost SubgRfspyLink opening serial port
Nov 24 09:55:42 localhost libmraa[16712]: libmraa version v1.7.0 initialised by user 'root' with EUID 0
Nov 24 09:55:42 localhost libmraa[16712]: gpio214: init: Failed to write to 'export': No such device
Nov 24 09:55:42 localhost libmraa[16712]: edison: Arduino board not detected, assuming Intel Edison Miniboard
Nov 24 09:55:42 localhost libmraa[16712]: libmraa initialised for platform 'Intel Edison' of type 2
Nov 24 09:55:42 localhost subg_rfspy status: OK
Nov 24 09:55:42 localhost Version: subg_rfspy 0.8
Nov 24 09:55:42 localhost Pump 622890 is already responding. Not sending wakeup messages
Nov 24 09:55:45 localhost SubgRfspyLink opening serial port
Nov 24 09:55:45 localhost libmraa[16720]: libmraa version v1.7.0 initialised by user 'root' with EUID 0
Nov 24 09:55:45 localhost libmraa[16720]: gpio214: init: Failed to write to 'export': No such device
Nov 24 09:55:45 localhost libmraa[16720]: edison: Arduino board not detected, assuming Intel Edison Miniboard
Nov 24 09:55:45 localhost libmraa[16720]: libmraa initialised for platform 'Intel Edison' of type 2
Nov 24 09:55:45 localhost subg_rfspy status: OK
Nov 24 09:55:45 localhost Version: subg_rfspy 0.8
Nov 24 09:55:45 localhost Pump 622890 is already responding. Not sending wakeup messages
Nov 24 09:55:47 localhost wpa_supplicant[283]: wlan0: CTRL-EVENT-DISCONNECTED bssid=c0:56:27:75:11:f6 reason=2
Nov 24 09:55:47 localhost kernel: [49788.023966] CFG80211-ERROR) wl_is_linkdown : Link down Reason : WLC_E_DEAUTH_IND
Nov 24 09:55:47 localhost kernel: [49788.024047] link down if wlan0 may call cfg80211_disconnected. event : 6, reason=2 from c0:56:27:75:11:f6
Nov 24 09:55:47 localhost kernel: [49788.026969] CFG80211-ERROR) wl_is_linkdown : Link down Reason : WLC_E_DEAUTH
Nov 24 09:55:47 localhost kernel: [49788.027000] CFG80211-ERROR) wl_is_linkdown : Link down Reason : WLC_E_LINK
Nov 24 09:55:47 localhost kernel: [49788.033353] cfg80211: Calling CRDA to update world regulatory domain
Nov 24 09:55:47 localhost kernel: [49788.069006] CFG80211-ERROR) wl_is_linkdown : Link down Reason : WLC_E_DEAUTH
Nov 24 09:55:47 localhost wpa_supplicant[283]: wlan0: CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
Nov 24 09:55:47 localhost kernel: [49788.107212] cfg80211: World regulatory domain updated:
Nov 24 09:55:47 localhost kernel: [49788.107237] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
Nov 24 09:55:47 localhost kernel: [49788.107257] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm)
Nov 24 09:55:47 localhost kernel: [49788.107273] cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz), (N/A, 2000 mBm)
Nov 24 09:55:47 localhost kernel: [49788.107288] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm)
Nov 24 09:55:47 localhost kernel: [49788.107304] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 2000 mBm)
Nov 24 09:55:47 localhost kernel: [49788.107319] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz), (N/A, 2000 mBm)
Nov 24 09:55:47 localhost kernel: [49788.107335] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm)
Nov 24 09:55:47 localhost kernel: [49788.107350] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm)
Nov 24 09:55:47 localhost kernel: [49788.107366] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm)
Nov 24 09:55:51 localhost wpa_supplicant[283]: wlan0: Trying to associate with c0:56:27:75:11:f6 (SSID='DasDing' freq=2462 MHz)
Nov 24 09:55:52 localhost kernel: [49792.267919] CFG80211-ERROR) wl_cfg80211_connect : Connectting withc0:56:27:75:11:f6 channel (11) ssid "DasDing", len (7)
Nov 24 09:55:52 localhost kernel: [49792.267919] 
Nov 24 09:55:52 localhost kernel: [49792.397455] wl_bss_connect_done succeeded with c0:56:27:75:11:f6
Nov 24 09:55:52 localhost wpa_supplicant[283]: wlan0: Associated with c0:56:27:75:11:f6
Nov 24 09:55:52 localhost kernel: [49792.497609] wl_bss_connect_done succeeded with c0:56:27:75:11:f6
Nov 24 09:55:53 localhost wpa_supplicant[283]: wlan0: WPA: Key negotiation completed with c0:56:27:75:11:f6 [PTK=CCMP GTK=TKIP]
Nov 24 09:55:53 localhost wpa_supplicant[283]: wlan0: CTRL-EVENT-CONNECTED - Connection to c0:56:27:75:11:f6 completed [id=0 id_str=]
Nov 24 09:55:55 localhost wpa_supplicant[283]: wlan0: WPA: Invalid EAPOL-Key MIC - dropping packet
Nov 24 09:55:56 localhost wpa_supplicant[283]: wlan0: WPA: Invalid EAPOL-Key MIC - dropping packet
Nov 24 09:55:56 localhost wpa_supplicant[283]: wlan0: WPA: Invalid EAPOL-Key MIC - dropping packet
Nov 24 09:55:57 localhost wpa_supplicant[283]: wlan0: WPA: Invalid EAPOL-Key MIC - dropping packet
Nov 24 09:56:01 localhost CRON[16750]: (root) CMD (cd /root/myopenaps && sleep 30 && ( ps aux | grep -v grep | grep bash | grep -q 'bin/oref0-pump-loop' || oref0-pump-loop ) 2>&1 | tee -a /var/log/openaps/pump-loop.log)
Nov 24 09:56:01 localhost CRON[16752]: (root) CMD (cd /root/myopenaps && ps aux | grep -v grep | grep -q 'oref0-autosens-loop' || oref0-autosens-loop 2>&1 | tee -a /var/log/openaps/autosens-loop.log)
Nov 24 09:56:01 localhost CRON[16753]: (root) CMD (find /var/log/openaps/pump-loop.log -mmin +5 | grep pump && ( killall -g --older-than 5m openaps; killall -g --older-than 5m oref0-pump-loop; killall -g --older-than 5m openaps-report ))
Nov 24 09:56:01 localhost CRON[16754]: (root) CMD (cd /root/myopenaps && ps aux | grep -v grep | grep -q 'oref0-ns-loop' || oref0-ns-loop | tee -a /var/log/openaps/ns-loop.log)
Nov 24 09:56:01 localhost CRON[16755]: (root) CMD (sudo wpa_cli scan)
Nov 24 09:56:01 localhost CRON[16756]: (root) CMD (oref0-radio-reboot)
Nov 24 09:56:01 localhost CRON[16757]: (root) CMD (ps aux | grep -v grep | grep -q "oref0-online 24:F0:94:19:D2:0D" || cd /root/myopenaps && oref0-online 24:F0:94:19:D2:0D 2>&1 >> /var/log/openaps/network.log)
Nov 24 09:56:01 localhost CRON[16758]: (root) CMD (( killall -g --older-than 30m openaps; killall -g --older-than 30m oref0-pump-loop; killall -g --older-than 30m openaps-report ))
Nov 24 09:56:01 localhost CRON[16759]: (root) CMD (cd /root/myopenaps && ( ps aux | grep -v grep | grep bash | grep -q 'bin/oref0-pump-loop' || oref0-pump-loop ) 2>&1 | tee -a /var/log/openaps/pump-loop.log)
Nov 24 09:56:01 localhost CRON[16775]: (root) CMD (ps aux | grep -v grep | grep -q "oref0-bluetoothup" || oref0-bluetoothup >> /var/log/openaps/network.log)
Nov 24 09:56:01 localhost CRON[16782]: (root) CMD (cd /root/myopenaps && ( ps aux | grep -v grep | grep -q 'peb-urchin-status 00:18:33:75:0F:66' || peb-urchin-status 00:18:33:75:0F:66 ) 2>&1 | tee -a /var/log/openaps/urchin-loop.log)
Nov 24 09:56:01 localhost CRON[16787]: (root) CMD (cd /root/myopenaps && oref0-pushover a2ugephuya73w83ig8oax63jhxomzj ucbh5ahbw64riiszc9np7co62amw7i 2>&1 >> /var/log/openaps/pushover.log)
Nov 24 09:56:01 localhost CRON[16748]: (CRON) info (No MTA installed, discarding output)
Nov 24 09:56:02 localhost CRON[16747]: (CRON) info (No MTA installed, discarding output)
Nov 24 09:56:02 localhost CRON[16744]: (CRON) info (No MTA installed, discarding output)
Nov 24 09:56:19 localhost SubgRfspyLink opening serial port
Nov 24 09:56:19 localhost libmraa[16959]: libmraa version v1.7.0 initialised by user 'root' with EUID 0
Nov 24 09:56:19 localhost libmraa[16959]: gpio214: init: Failed to write to 'export': No such device
Nov 24 09:56:19 localhost libmraa[16959]: edison: Arduino board not detected, assuming Intel Edison Miniboard
Nov 24 09:56:19 localhost libmraa[16959]: libmraa initialised for platform 'Intel Edison' of type 2
Nov 24 09:56:19 localhost subg_rfspy status: OK
Nov 24 09:56:19 localhost Version: subg_rfspy 0.8
Nov 24 09:56:23 localhost SubgRfspyLink opening serial port
Nov 24 09:56:23 localhost libmraa[16987]: libmraa version v1.7.0 initialised by user 'root' with EUID 0
Nov 24 09:56:23 localhost libmraa[16987]: gpio214: init: Failed to write to 'export': No such device
Nov 24 09:56:23 localhost libmraa[16987]: edison: Arduino board not detected, assuming Intel Edison Miniboard
Nov 24 09:56:24 localhost libmraa[16987]: libmraa initialised for platform 'Intel Edison' of type 2
Nov 24 09:56:24 localhost subg_rfspy status: OK
Nov 24 09:56:24 localhost Version: subg_rfspy 0.8
Nov 24 09:56:24 localhost SubgRfspyLink opening serial port
Nov 24 09:56:24 localhost subg_rfspy status: OK
Nov 24 09:56:24 localhost Version: subg_rfspy 0.8
Nov 24 09:56:24 localhost SubgRfspyLink opening serial port
Nov 24 09:56:24 localhost subg_rfspy status: OK
Nov 24 09:56:24 localhost Version: subg_rfspy 0.8
Nov 24 09:56:31 localhost SubgRfspyLink opening serial port
Nov 24 09:56:31 localhost libmraa[16998]: libmraa version v1.7.0 initialised by user 'root' with EUID 0
Nov 24 09:56:31 localhost libmraa[16998]: gpio214: init: Failed to write to 'export': No such device
Nov 24 09:56:31 localhost libmraa[16998]: edison: Arduino board not detected, assuming Intel Edison Miniboard
Nov 24 09:56:31 localhost libmraa[16998]: libmraa initialised for platform 'Intel Edison' of type 2
Nov 24 09:56:31 localhost subg_rfspy status: OK
Nov 24 09:56:31 localhost Version: subg_rfspy 0.8
Nov 24 09:56:31 localhost SubgRfspyLink opening serial port
Nov 24 09:56:31 localhost subg_rfspy status: OK
Nov 24 09:56:31 localhost Version: subg_rfspy 0.8
Nov 24 09:56:32 localhost SubgRfspyLink opening serial port
Nov 24 09:56:32 localhost subg_rfspy status: OK
Nov 24 09:56:32 localhost Version: subg_rfspy 0.8
Nov 24 09:56:39 localhost CRON[16740]: (CRON) info (No MTA installed, discarding output)
Nov 24 09:56:42 localhost SubgRfspyLink opening serial port
Nov 24 09:56:42 localhost libmraa[17052]: libmraa version v1.7.0 initialised by user 'root' with EUID 0
Nov 24 09:56:42 localhost libmraa[17052]: gpio214: init: Failed to write to 'export': No such device
Nov 24 09:56:42 localhost libmraa[17052]: edison: Arduino board not detected, assuming Intel Edison Miniboard
Nov 24 09:56:42 localhost libmraa[17052]: libmraa initialised for platform 'Intel Edison' of type 2
Nov 24 09:56:42 localhost subg_rfspy status: OK
Nov 24 09:56:42 localhost Version: subg_rfspy 0.8
Nov 24 09:56:46 localhost SubgRfspyLink opening serial port
Nov 24 09:56:46 localhost libmraa[17103]: libmraa version v1.7.0 initialised by user 'root' with EUID 0
Nov 24 09:56:46 localhost libmraa[17103]: gpio214: init: Failed to write to 'export': No such device
Nov 24 09:56:46 localhost libmraa[17103]: edison: Arduino board not detected, assuming Intel Edison Miniboard
Nov 24 09:56:46 localhost libmraa[17103]: libmraa initialised for platform 'Intel Edison' of type 2
Nov 24 09:56:46 localhost subg_rfspy status: OK
Nov 24 09:56:46 localhost Version: subg_rfspy 0.8
Nov 24 09:56:46 localhost SubgRfspyLink opening serial port
Nov 24 09:56:46 localhost subg_rfspy status: OK
Nov 24 09:56:46 localhost Version: subg_rfspy 0.8
Nov 24 09:56:55 localhost SubgRfspyLink opening serial port
Nov 24 09:56:55 localhost libmraa[17176]: libmraa version v1.7.0 initialised by user 'root' with EUID 0
Nov 24 09:56:55 localhost libmraa[17176]: gpio214: init: Failed to write to 'export': No such device
Nov 24 09:56:55 localhost libmraa[17176]: edison: Arduino board not detected, assuming Intel Edison Miniboard
Nov 24 09:56:55 localhost libmraa[17176]: libmraa initialised for platform 'Intel Edison' of type 2
Nov 24 09:56:55 localhost subg_rfspy status: OK
Nov 24 09:56:55 localhost Version: subg_rfspy 0.8
Nov 24 09:56:59 localhost SubgRfspyLink opening serial port
Nov 24 09:56:59 localhost libmraa[17235]: libmraa version v1.7.0 initialised by user 'root' with EUID 0
Nov 24 09:56:59 localhost libmraa[17235]: gpio214: init: Failed to write to 'export': No such device
Nov 24 09:56:59 localhost libmraa[17235]: edison: Arduino board not detected, assuming Intel Edison Miniboard
Nov 24 09:56:59 localhost libmraa[17235]: libmraa initialised for platform 'Intel Edison' of type 2
Nov 24 09:56:59 localhost subg_rfspy status: OK
Nov 24 09:56:59 localhost Version: subg_rfspy 0.8
Nov 24 09:57:01 localhost CRON[17263]: (root) CMD (cd /root/myopenaps && oref0-pushover a2ugephuya73w83ig8oax63jhxomzj ucbh5ahbw64riiszc9np7co62amw7i 2>&1 >> /var/log/openaps/pushover.log)
Nov 24 09:57:01 localhost CRON[17270]: (root) CMD (find /var/log/openaps/pump-loop.log -mmin +5 | grep pump && ( killall -g --older-than 5m openaps; killall -g --older-than 5m oref0-pump-loop; killall -g --older-than 5m openaps-report ))
Nov 24 09:57:01 localhost CRON[17273]: (root) CMD (cd /root/myopenaps && ps aux | grep -v grep | grep -q 'oref0-ns-loop' || oref0-ns-loop | tee -a /var/log/openaps/ns-loop.log)
Nov 24 09:57:01 localhost CRON[17274]: (root) CMD (cd /root/myopenaps && ps aux | grep -v grep | grep -q 'oref0-autosens-loop' || oref0-autosens-loop 2>&1 | tee -a /var/log/openaps/autosens-loop.log)
Nov 24 09:57:01 localhost CRON[17276]: (root) CMD (cd /root/myopenaps && ( ps aux | grep -v grep | grep bash | grep -q 'bin/oref0-pump-loop' || oref0-pump-loop ) 2>&1 | tee -a /var/log/openaps/pump-loop.log)
Nov 24 09:57:01 localhost CRON[17275]: (root) CMD (sudo wpa_cli scan)
Nov 24 09:57:01 localhost CRON[17277]: (root) CMD (oref0-radio-reboot)
Nov 24 09:57:01 localhost CRON[17278]: (root) CMD (cd /root/myopenaps && sleep 30 && ( ps aux | grep -v grep | grep bash | grep -q 'bin/oref0-pump-loop' || oref0-pump-loop ) 2>&1 | tee -a /var/log/openaps/pump-loop.log)
Nov 24 09:57:01 localhost CRON[17279]: (root) CMD (( killall -g --older-than 30m openaps; killall -g --older-than 30m oref0-pump-loop; killall -g --older-than 30m openaps-report ))
Nov 24 09:57:01 localhost CRON[17294]: (root) CMD (cd /root/myopenaps && ( ps aux | grep -v grep | grep -q 'peb-urchin-status 00:18:33:75:0F:66' || peb-urchin-status 00:18:33:75:0F:66 ) 2>&1 | tee -a /var/log/openaps/urchin-loop.log)
Nov 24 09:57:01 localhost CRON[17293]: (root) CMD (ps aux | grep -v grep | grep -q "oref0-online 24:F0:94:19:D2:0D" || cd /root/myopenaps && oref0-online 24:F0:94:19:D2:0D 2>&1 >> /var/log/openaps/network.log)
Nov 24 09:57:01 loNov 24 16:55:04 localhost rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="379" x-info="http://www.rsyslog.com"] start
Nov 24 16:55:04 localhost systemd-modules-load[112]: Inserted module 'bcm_bt_lpm'
Nov 24 16:55:04 localhost systemd-modules-load[112]: Inserted module 'bcm4334x'
Nov 24 16:55:04 localhost systemd-modules-load[112]: Inserted module 'g_multi'
Nov 24 16:55:04 localhost systemd-modules-load[112]: Inserted module 'uvcvideo'
Nov 24 16:55:04 localhost systemd-modules-load[112]: Failed to find module 'ftdi_sio'
Nov 24 16:55:04 localhost systemd-modules-load[112]: Module 'iio_basincove_gpadc' is builtin
Nov 24 16:55:04 localhost systemd[1]: systemd-modules-load.service: main process exited, code=exited, status=1/FAILURE
Nov 24 16:55:04 localhost systemd[1]: Failed to start Load Kernel Modules.
Nov 24 16:55:04 localhost systemd[1]: Unit systemd-modules-load.service entered failed state.
Nov 24 16:55:04 localhost systemd[1]: Started Load/Save Random Seed.
Nov 24 16:55:04 localhost systemd[1]: Started Create Static Device Nodes in /dev.
Nov 24 16:55:04 localhost systemd[1]: Mounted /tmp.
Nov 24 16:55:04 localhost systemd[1]: Mounted Configuration File System.
Nov 24 16:55:04 localhost systemd[1]: Mounted FUSE Control File System.
Nov 24 16:55:04 localhost systemd[1]: Started udev Kernel Device Manager.
Nov 24 16:55:04 localhost systemd[1]: Started Apply Kernel Variables.
Nov 24 16:55:04 localhost systemd[1]: Found device /dev/ttyMFD2.
Nov 24 16:55:04 localhost systemd[1]: Starting system-systemd\x2drfkill.slice.
Nov 24 16:55:04 localhost kernel: [    0.000000] Initializing cgroup subs

daemon.log

Nov 24 08:55:44 localhost wpa_supplicant[283]: wlan0: WPA: Group rekeying completed with c0:56:27:75:11:f6 [GTK=TKIP]
Nov 24 09:55:47 localhost wpa_supplicant[283]: wlan0: CTRL-EVENT-DISCONNECTED bssid=c0:56:27:75:11:f6 reason=2
Nov 24 09:55:47 localhost wpa_supplicant[283]: wlan0: CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
Nov 24 09:55:51 localhost wpa_supplicant[283]: wlan0: Trying to associate with c0:56:27:75:11:f6 (SSID='DasDing' freq=2462 MHz)
Nov 24 09:55:52 localhost wpa_supplicant[283]: wlan0: Associated with c0:56:27:75:11:f6
Nov 24 09:55:53 localhost wpa_supplicant[283]: wlan0: WPA: Key negotiation completed with c0:56:27:75:11:f6 [PTK=CCMP GTK=TKIP]
Nov 24 09:55:53 localhost wpa_supplicant[283]: wlan0: CTRL-EVENT-CONNECTED - Connection to c0:56:27:75:11:f6 completed [id=0 id_str=]
Nov 24 09:55:55 localhost wpa_supplicant[283]: wlan0: WPA: Invalid EAPOL-Key MIC - dropping packet
Nov 24 09:55:56 localhost wpa_supplicant[283]: wlan0: WPA: Invalid EAPOL-Key MIC - dropping packet
Nov 24 09:55:56 localhost wpa_supplicant[283]: wlan0: WPA: Invalid EAPOL-Key MIC - dropping packet
Nov 24 09:55:57 localhost wpa_supplicant[283]: wlan0: WPA: Invalid EAPOL-Key MIC - dropping packet
Nov 24 09:57:06 localhost wpa_supplicant[283]: wlan0: CTRL-EVENT-DISCONNECTED bssid=c0:56:27:75:11:f6 reason=3 locally_generated=1
Nov 24 09:57:06 localhost dhclient: receive_packet failed on wlan0: Network is down
Nov 24 09:57:06 localhost avahi-daemon[351]: Interface wlan0.IPv4 no longer relevant for mDNS.
Nov 24 09:57:06 localhost avahi-daemon[351]: Leaving mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.114.
Nov 24 09:57:06 localhost wpa_supplicant[283]: wlan0: CTRL-EVENT-TERMINATING
Nov 24 09:57:06 localhost avahi-daemon[351]: Withdrawing address record for 192.168.1.114 on wlan0.
Nov 24 09:57:06 localhost dhclient: Killed old client process
Nov 24 09:57:07 localhost dhclient: Internet Systems Consortium DHCP Client 4.3.1
Nov 24 09:57:07 localhost dhclient: Copyright 2004-2014 Internet Systems Consortium.
Nov 24 09:57:07 localhost dhclient: All rights reserved.
Nov 24 09:57:07 localhost dhclient: For info, please visit httpsNov 24 16:55:04 localhost systemd-modules-load[112]: Inserted module 'bcm_bt_lpm'
Nov 24 16:55:04 localhost systemd-modules-load[112]: Inserted module 'bcm4334x'
Nov 24 16:55:04 localhost systemd-modules-load[112]: Inserted module 'g_multi'
Nov 24 16:55:04 localhost systemd-modules-load[112]: Inserted module 'uvcvideo'

auth.log

Nov 24 09:56:49 localhost sudo:     root : TTY=unknown ; PWD=/root/myopenaps ; USER=root ; COMMAND=/usr/local/bin/bt-pan client XXXXXXXX
Nov 24 09:56:49 localhost sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Nov 24 09:56:50 localhost sudo: pam_unix(sudo:session): session closed for user root
Nov 24 09:57:01 localhost CRON[17251]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 09:57:01 localhost CRON[17256]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 09:57:01 localhost CRON[17253]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 09:57:01 localhost CRON[17252]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 09:57:01 localhost CRON[17259]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 09:57:01 localhost CRON[17260]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 09:57:01 localhost CRON[17257]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 09:57:01 localhost CRON[17262]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 09:57:01 localhost CRON[17261]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 09:57:01 localhost CRON[17254]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 09:57:01 localhost CRON[17258]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 09:57:01 localhost CRON[17251]: pam_unix(cron:session): session closed for user root
Nov 24 09:57:01 localhost CRON[17255]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 09:57:01 localhost sudo:     root : TTY=unknown ; PWD=/root ; USER=root ; COMMANNov 24 16:55:04 localhost sshd[375]: Server listening on 0.0.0.0 port 22.
Nov 24 16:55:04 localhost sshd[375]: Server listening on :: port 22.
Nov 24 16:55:04 localhost CRON[384]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 16:55:04 localhost CRON[385]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 24 16:55:04 localhost CRON[384]: pam_unix(cron:session): session closed for user root
Nov 24 16:55:05 localhost systemd-logind[412]: New seat seat0.

kern.log

Nov 24 09:55:52 localhost kernel: [49792.267919] 
Nov 24 09:55:52 localhost kernel: [49792.397455] wl_bss_connect_done succeeded with c0:56:27:75:11:f6
Nov 24 09:55:52 localhost kernel: [49792.497609] wl_bss_connect_done succeeded with c0:56:27:75:11:f6
Nov 24 09:57:06 localhost kernel: [49866.311698] CFG80211-ERROR) wl_cfg80211_disconnect : Reason 3
Nov 24 09:57:06 localhost kernel: [49866.335353] cfg80211: Calling CRDA to update world regulatory domain
Nov 24 09:57:06 localhost kernel: [49866.381270] cfg80211: World regulatory domain updated:
Nov 24 09:57:06 localhost kernel: [49866.381295] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
Nov 24 09:57:06 localhost kernel: [49866.381315] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm)
Nov 24 09:57:06 localhost kernel: [49866.381331] cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz), (N/A, 2000 mBm)
Nov 24 09:57:06 localhost kernel: [49866.381347] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm)
Nov 24 09:57:06 localhost kernel: [49866.381362] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 2000 mBm)
Nov 24 09:57:06 localhost kernel: [49866.381377] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz), (N/A, 2000 mBm)
Nov 24 09:57:06 localhost kernel: [49866.381393] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm)
Nov 24 09:57:06 localhost kernel: [49866.381408] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm)
Nov 24 09:57:06 localhost kernel: [49866.381425] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm)
Nov 24 09:57:06 localhost kernel: [49866.451150] CFG80211-ERROR) wl_is_linkdown : Link down Reason : WLC_E_LINK
Nov 24 09:57:06 localhost kernel: [49866.452593] link down if wlan0 may call cfg80211_disconnected. event : 16, reason=2 from c0:56:27:75:11:f6
Nov 24 09:57:06 localhost kernel: [49866.458173] CFG80211-ERROR) wl_escan_handler : escan is not ready ndev f580f000 drv_status 0x0 e_type 69 e_states 4
Nov 24 09:57:06 localhost kernel: [49866.466844] wl_android_wifi_off in
Nov 24 09:57:06 localhost kernel: [49866.468113] wifi_platform_set_power = 0
Nov 24 09:57:06 localhost kernel: [49866.481403] wpa_supplicant (283) used greatest stack depth: 4756 bytes left
Nov 24 16:55:04 localhost kernel: [    0.000000] Initializing cgroup subsys cpuset
Nov 24 16:55:04 localhost kernel: [    0.000000] Initializing cgroup subsys cpu
Nov 24 16:55:04 localhost kernel: [    0.000000] Initializing cgroup subsys cpuacct
Nov 24 16:55:04 localhost kernel: [    0.000000] Linux version 3.10.98-poky-edison (robin@robin-i7) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) ) #5 SMP PREEMPT Tue Apr 4 22:42:44 EDT 2017
Nov 24 16:55:04 localhost kernel: [    0.000000] e820: BIOS-provided physical RAM map:
Nov 24 16:55:04 localhost kernel: [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000097fff] usable

user.log

Nov 24 09:55:45 localhost Pump XXXXXXXX is already responding. Not sending wakeup messages
Nov 24 09:56:19 localhost SubgRfspyLink opening serial port
Nov 24 09:56:19 localhost subg_rfspy status: OK
Nov 24 09:56:19 localhost Version: subg_rfspy 0.8
Nov 24 09:56:23 localhost SubgRfspyLink opening serial port
Nov 24 09:56:24 localhost subg_rfspy status: OK
Nov 24 09:56:24 localhost Version: subg_rfspy 0.8
Nov 24 09:56:24 localhost SubgRfspyLink opening serial port
Nov 24 09:56:24 localhost subg_rfspy status: OK
Nov 24 09:56:24 localhost Version: subg_rfspy 0.8
Nov 24 09:56:24 localhost SubgRfspyLink opening serial port
Nov 24 09:56:24 localhost subg_rfspy status: OK
Nov 24 09:56:24 localhost Version: subg_rfspy 0.8
Nov 24 09:56:31 localhost SubgRfspyLink opening serial port
Nov 24 09:56:31 localhost subg_rfspy status: OK
Nov 24 09:56:31 localhost Version: subg_rfspy 0.8
Nov 24 09:56:31 localhost SubgRfspyLink opening serial port
Nov 24 09:56:31 localhost subg_rfspy status: OK
Nov 24 09:56:31 localhost Version: subg_rfspy 0.8
Nov 24 09:56:32 localhost SubgRfspyLink opening serial port
Nov 24 09:56:32 localhost subg_rfspy status: OK
Nov 24 09:56:32 localhost Version: subg_rfspy 0.8
Nov 24 09:56:42 localhost SubgRfspyLink opening serial port
Nov 24 09:56:42 localhost subg_rfspy status: OK
Nov 24 09:56:42 localhost Version: subg_rfspy 0.8
Nov 24 09:56:46 localhost SubgRfspyLink opening serial port
Nov 24 09:56:46 localhost subg_rfspy status: OK
Nov 24 09:56:46 localhost Version: subg_rfspy 0.8
Nov 24 09:56:46 localhost SubgRfspyLink opening serial port
Nov 24 09:56:46 localhost subg_rfspy status: OK
Nov 24 09:56:46 localhost Version: subg_rfspy 0.8
Nov 24 09:56:55 localhost SubgRfspyLink opening serial port
Nov 24 09:56:55 localhost subg_rfspy status: OK
Nov 24 09:56:55 localhost Version: subg_rfspy 0.8
Nov 24 09:56:59 localhost SubgRfspyLink opening serial port
Nov 24 09:56:59 localhost subg_rfspy status: OK
Nov 24 09:56:59 localhost Version: subg_rfspy 0.8
Nov 24 09:57:04 localhost SubgRfspyLink opening serial port
Nov 24 09:57:05 localhost subg_rfspy status: OK
Nov 24 09:57:05 localhost Version: subg_rfspy 0.8
Nov 24 09:57:05 localhost SubgRfspyLink opening serial port
Nov 24 09:57:05 localhost subg_rfspy status: OK
Nov 24 09:57:05 localhost Version: subg_rfspy 0.8
Nov 24 09:57:05 localhost SubgRfspyLink opening serial port
Nov 24 09:57:05 localhost subg_rfspy status: OK
Nov 24 09:57:05 localhost Version: subg_rfspy 0.8
Nov 24 16:56:19 localhost SubgRfspyLink opening serial port
Nov 24 16:56:19 localhost subg_rfspy status: OK
Nov 24 16:56:19 localhost Version: subg_rfspy 0.8
Nov 24 16:56:20 localhost Timed out or other comms error - Received an error response Timeout - retrying: 1 of 3
Nov 24 16:56:20 localhost Timed out or other comms error - Received an error response Timeout - retrying: 2 of 3
Nov 24 16:56:22 localhost Timed out or other comms error - Received an error response Timeout - retrying: 3 of 3
Nov 24 16:56:24 localhost BEGIN POWER CONTROL XXXXXXXX
Nov 24 16:56:44 localhost 1511560604.42 - Response not received - retrying
Nov 24 16:56:47 localhost SubgRfspyLink opening serial port
Bender1061 commented 6 years ago

So started thinking about those logs and it all seams to be the wlan0 is killed. so I looked at the logs

Starting oref0-online at Fri Nov 24 09:56:01 EST 2017.
At Fri Nov 24 09:56:02 EST 2017 my wifi network name is DasDing
At Fri Nov 24 09:56:02 EST 2017 my local wifi IP is: 192.168.1.114
At Fri Nov 24 09:56:02 EST 2017 my Bluetooth PAN is not connected
At Fri Nov 24 09:56:02 EST 2017 my public IP is: not found

At Fri Nov 24 09:56:18 EST 2017 my wifi network name is DasDing

No Internet access detected, attempting to connect BT to 24:F0:94:19:D2:0D

Starting oref0-online at Fri Nov 24 09:57:02 EST 2017.
At Fri Nov 24 09:57:02 EST 2017 my wifi network name is DasDing
At Fri Nov 24 09:57:02 EST 2017 my local wifi IP is: 192.168.1.114
At Fri Nov 24 09:57:02 EST 2017 my Bluetooth PAN is not connected
At Fri Nov 24 09:57:02 EST 2017 my public IP is: 
Unable to connect via wifi or Bluetooth; activating local-only hotspot
Killing wpa_supplicant
Selected interface 'wlan0'
OK
Shutting down wlan0

Starting oref0-online at Fri Nov 24 16:56:01 EST 2017.
At Fri Nov 24 16:56:02 EST 2017 my wifi network name is DasDing
At Fri Nov 24 16:56:02 EST 2017 my local wifi IP is: 192.168.1.114
At Fri Nov 24 16:56:02 EST 2017 my Bluetooth PAN is not connected
At Fri Nov 24 16:56:02 EST 2017 my public IP is: 67.149.89.94
At Fri Nov 24 16:56:03 EST 2017 my local hotspot is not running
Finished oref0-online at Fri Nov 24 16:56:03 EST 2017.

Starting oref0-online at Fri Nov 24 16:57:02 EST 2017.
At Fri Nov 24 16:57:02 EST 2017 my wifi network name is DasDing
At Fri Nov 24 16:57:02 EST 2017 my local wifi IP is: 192.168.1.114
At Fri Nov 24 16:57:02 EST 2017 my Bluetooth PAN is not connected
At Fri Nov 24 16:57:02 EST 2017 my public IP is: 67.149.89.94
At Fri Nov 24 16:57:03 EST 2017 my local hotspot is not running
Finished oref0-online at Fri Nov 24 16:57:03 EST 2017.

now why it could not connect to my network, which it says it was connected too, I have no idea, but when it killed the wlan0 is where the edison started working.

Now I'm really having a hard time believing that no one else is having this issue, as I've documented 3 of my 4 portable rigs doing this. I'm at the point I can't trust leaving the house with these rigs, So I'm disabling my hotspot. so it does not kill the wlan0

diabeticdude2 commented 6 years ago

I am having the same problems on one of my rigs at this time. I can pull logs if they are needed. The problems got so bad over the Thanksgiving holiday that I had to turn off my rig and move to my backup solution. Essentially the problem is easy to fix by killing power to the system and letting it come back up.
However, losing an hour without my rig being active tends to mess up my blood sugars. Moreover without notifications that it is down can be very troublesome. Which is why I asked about having a Pushover set up for stale data or no communication with the rig a little bit ago. Last night, for whatever reason, my rig went out and stayed out all through my resting. When I woke up this morning I saw what had happened, and left it in its current state still on and still functioning the same way. Let me know and I can post whatever information you may need.

Bender1061 commented 6 years ago

anything that would notify would have to be from an outside source as, as the rig stops in it's track. from what I can tell it's how we are stopping the wlan0 basically when oref0-online runs ifdown wlan0 maybe we need to find a "softer" way to kill the wlan in order to change the network/interfaces for the Hotspot. I wonder if there was a way to put logic in the network/interfaces? Hmm... I'll have to work on that later.

Bender1061 commented 6 years ago

on the rig that seams to do it every couple days, I'm going to turn off the hotspot in preferences and keep running it and see if the problem comes back.

scottleibrand commented 6 years ago

We’ve turned off the offline hotspot on rigs that require Internet connectivity for similar issues, and only leave it on for the rig with a connected Dexcom receiver. Until we figure out a method that doesn’t trigger crashes, perhaps we could add some logic to check lsusb and only activate offline hotspot if it finds a receiver?

Bender1061 commented 6 years ago

I'm using MDT, so yes I don't need the hotspot, unless I'm trying to access the rig to make changes or check something when I don't have network. just to view it I still run pancredble on my second pebble. but the hotspot is still a great idea, just need to figure out how to stop the rig.

Anyone that wants to see if the network log is stopping on yours at the same time as mine did above would be helpful.

if we all do notice the same thing, we shall just have to figure out another way. There is always more than one way to skin a cat.

diabeticdude2 commented 6 years ago

I do not need to hotspot, however I do have it enabled. I will disable it when I get home

diabeticdude2 commented 6 years ago

I was hoping for communication from pushover if nightscout has not received communication from the rig in 30 plus minutes. If I am online and nightscout is online but for whatever reason the rig is off-line. I should be able to get a notification from Pushover. I know Nightscout will not do it unless it is open which would require me to constantly check the site.

Bender1061 commented 6 years ago

if you were on hotspot mode, it would not be uploading to your nightscout site or really it would not be communicating to anywhere on the net. So in most situations your phone does not have internet (reason it went to hotspot mode) so even if pushover was to try to send you something about it, you would not get it!

scottleibrand commented 6 years ago

Nightscout can send pushover alerts directly (including on no data from rigs).

diabeticdude2 commented 6 years ago

I was under the impression offline hotspot mode would give me a way to SSH to my rig in the event that it went down

scottleibrand commented 6 years ago

In the event what went down? If the rig crashes or powers down it can’t exactly respond to ssh or anything else.

diabeticdude2 commented 6 years ago

I would love to have that notification that way in the very least I know when to manually re-set my rig. With that being said I am also going to disable the hotspot mode because it is seeming very apparent that I have no need for it at this point anyways and if it will help maybe that’s all I need.

Bender1061 commented 6 years ago

@scottleibrand something to think about on the explorer hat would be to enable hotspot from the screen. and then manually turn it off when we want to try to connect to the world again.

diabeticdude2 commented 6 years ago

In the event that it lost communication with the Internet I’m sorry for not being more specific

scottleibrand commented 6 years ago

Yeah, that is the intended use case. And even more specifically, if you'll be going offline and have an offline CGM source (connected Dexcom receiver, MDT CGM, or xDripAPS). We find that for connected Dexcom receiver, it works best to have the offline rig be a different rig than the online one Dana carries around every day.

tepidjuice commented 6 years ago

I don't have hotspot enabled and this was happening with my rigs before hotspot was introduced.

tepidjuice commented 6 years ago

This happened again today. I could not connect to the rig via serial. After a hard reboot I got the below stack trace on start up, which I don't get after a normal reboot. Also according to pump-loop logs the rig was still looping while it wasn't connecting.

[ 9.028289] i2c-designware-pci 0000:00:08.0: ===== REGISTER DUMP (i2c) ===== [ 9.028366] i2c-designware-pci 0000:00:08.0: DW_IC_CON: 0x65 [ 9.028436] i2c-designware-pci 0000:00:08.0: DW_IC_TAR: 0x20 [ 9.028505] i2c-designware-pci 0000:00:08.0: DW_IC_SS_SCL_HCNT: 0x2f8 [ 9.028575] i2c-designware-pci 0000:00:08.0: DW_IC_SS_SCL_LCNT: 0x37b [ 9.028645] i2c-designware-pci 0000:00:08.0: DW_IC_FS_SCL_HCNT: 0x87 [ 9.028714] i2c-designware-pci 0000:00:08.0: DW_IC_FS_SCL_LCNT: 0x10a [ 9.028783] i2c-designware-pci 0000:00:08.0: DW_IC_INTR_STAT: 0x0 [ 9.028852] i2c-designware-pci 0000:00:08.0: DW_IC_INTR_MASK: 0x246 [ 9.028921] i2c-designware-pci 0000:00:08.0: DW_IC_RAW_INTR_STAT: 0x510 [ 9.028991] i2c-designware-pci 0000:00:08.0: DW_IC_RX_TL: 0x20 [ 9.029060] i2c-designware-pci 0000:00:08.0: DW_IC_TX_TL: 0x20 [ 9.029129] i2c-designware-pci 0000:00:08.0: DW_IC_ENABLE: 0x1 [ 9.029197] i2c-designware-pci 0000:00:08.0: DW_IC_STATUS: 0x23 [ 9.029266] i2c-designware-pci 0000:00:08.0: DW_IC_TXFLR: 0x2 [ 9.029334] i2c-designware-pci 0000:00:08.0: DW_IC_RXFLR: 0x0 [ 9.029402] i2c-designware-pci 0000:00:08.0: DW_IC_TX_ABRT_SOURCE: 0x0 [ 9.029471] i2c-designware-pci 0000:00:08.0: DW_IC_DATA_CMD: 0x0 [ 9.029538] i2c-designware-pci 0000:00:08.0: =============================== [ 9.029638] CPU: 0 PID: 149 Comm: systemd-udevd Tainted: G W O 3.10.98-jubilinux-edison #13 [ 9.029643] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 466 2014.06.23:19.20.05 [ 9.029650] task: f4c310e0 ti: f4c62000 task.ti: f4c62000 [ 9.029711] Stack: [ 9.029776] Call Trace: [ 9.030541] Code: b3 ff ff 89 f8 09 d0 80 ce 04 83 ff 02 0f 44 c2 8b 15 94 eb b0 c1 89 82 00 b3 ff ff f7 c6 00 02 00 00 74 14 e8 a7 24 0b 00 56 9d <83> c4 04 5b 5e 5f 5d c3 90 8d 74 26 00 56 9d e8 91 20 0b 00 83 [ 9.030565] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W O 3.10.98-jubilinux-edison #13 [ 9.030570] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 466 2014.06.23:19.20.05 [ 9.030578] task: f6883de0 ti: f6a1a000 task.ti: f6a1a000 [ 9.030640] Stack: [ 9.030706] Call Trace: [ 9.030902] Code: e2 81 e2 00 e0 ff ff 31 c9 8d 42 08 89 ca 0f 01 c8 0f ae f0 89 f6 89 e0 25 00 e0 ff ff 8b 40 08 a8 08 75 07 b1 01 89 f0 0f 01 c9 <85> 1d d8 33 b4 c1 75 0d 8d 55 f0 b8 05 00 00 00 e8 e5 8d dc ff [ 9.031551] i2c-1: try to abort xfer, scl_gpio 19, sda_gpio 20 [ 9.033501] i2c-1: scl_gpio val 0, sda_gpio val 0 [ 9.033568] i2c-1: toggle begin [ 9.033731] i2c-1: toggle SCL loop 0 [ 9.033887] i2c-1: toggle SCL loop 1 [ 9.034070] i2c-1: toggle SCL loop 2 [ 9.034262] i2c-1: toggle SCL loop 3 [ 9.034440] i2c-1: toggle SCL loop 4 [ 9.034611] i2c-1: toggle SCL loop 5 [ 9.034776] i2c-1: toggle SCL loop 6 [ 9.034945] i2c-1: toggle SCL loop 7 [ 9.035111] i2c-1: toggle SCL loop 8 [ 9.035156] i2c-1: toggle end [ 9.064661] pca953x 1-0020: failed reading register [ 9.084908] pca953x 1-0021: failed reading register [ 9.105091] pca953x 1-0022: failed reading register [ 9.125274] pca953x 1-0023: failed reading register

alimhassam commented 6 years ago

Similar(Same?) issue here, even with disabled offilne hotspot. the , not sure how to debug this. oref0-online (network.log) get blank and never reconnect to anything. I'm not sure what's causing it. I'll see if i can increase logging for next time it happens.

Bender1061 commented 6 years ago

Since I disabled the hotspot, I've not had any lock up issues. We may have a couple different issues happening. I have some ideas but I'm not going to have much time to dig in to anything myself till after the new year. Shut down season is coming!

diabeticdude2 commented 6 years ago

Same results here after disabling the hotspot not one single lock up. Not sure why, but, disabling the hotspot fixed my issues also.

Bender1061 commented 6 years ago

it has to do with how we are shutting down the wifi before changing the interfaces to turn it into a hotspot. we need to find a "softer" way of doing it. it's linux, so I'm sure there are many ways to do what we are doing. Just to find the best way on an ecosystem that is frankly not used very much. other than the memory issues I'm betting we have less strange issues like this with the Pi zero.

tepidjuice commented 6 years ago

I have wifi soft blocked using rfkill as I prefer it to always be on tether. Could that be causing the hang?

tepidjuice commented 6 years ago

I've checked my logs a few times and nothing seems to be wrong. So, failing identifying what's happening, could I suggest a hack fix for this issue?

If no bnep0/wlan0/usb connection is active for say 15-30 minutes then the rig reboots?

tepidjuice commented 6 years ago

After rebooting the rig my phone now says that it is tethered to my phone but there is no bluetooth connection e.g. I can't ssh into it over bluetooth. When connecting over serial the bnep0 device does not exist. The only fix seems to be repeated reboots until bnep0 appears and everything starts working as normal again. Below is the output of systemctl status bluetooth -l, when the bnep0 device isn't registered:

Bluetooth daemon 5.23 bluetoothd[6038]: D-Bus setup failed: Name already in use systemd[1]: bluetooth.service: main process exited, code=exited, status=1/FAILURE systemd[1]: Failed to start Bluetooth service. systemd[1]: Unit bluetooth.service entered failed state. `

tynbendad commented 6 years ago

i found a workaround to allow the rig to reboot after a system panic so the loop will resume after this instead of requiring a manual reboot (which may take a while to figure out & perform).

i've been using it for a week now and noticed 2 panics that resulted in reboots & we didn't notice skipping a beat in looping. it would be better of course to fix the root cause, but this appears a better alternative than hanging in the meantime.

add to /etc/sysctl.conf:
# When kernel panic’s, reboot after 10 second delay
kernel.panic = 10

from: https://www.cyberciti.biz/tips/reboot-linux-box-after-a-kernel-panic.html

also, I will note that bnep0 appears to cause the panics on my rig as well, but only when the phone is not in the vicinity of the rig. but maybe we are tearing down the connections then also due to a wifi glitch...

scottleibrand commented 6 years ago

Sweet, thanks! I've applied this to two of our rigs, and will report back as to whether it works for us as well. If so, I'll probably add it to oref0-setup.

PieterGit commented 6 years ago

@tynbendad also installed it on my rig. I was hoping it fixed the problem with a Explorer board and Zendure pass through charging. After inserting USB power cable into the Zendure, the rig (without lipo) hangs. Hard power off is needed to get the rig back online. I will monitor it the coming weeks to see if this helps.

tynbendad commented 6 years ago

@PieterGit it sounds like you're having more of a hardware issue - i'd guess the charger isn't fully powering the usb when it switches to charging itself... or something like that.

fwiw i seem to be getting about 1 kernel panic a day lately - probably due to the bt_is_wifi feature in oref0-online (and they seem to happen when the phone is not nearby the rig), and they have all rebooted automatically: root@benaps3:~# (cat /var/log/syslog /var/log/syslog.[0-9]; gunzip -c /var/log/syslog*gz) | grep -a "BUG: unable to handle kernel NULL pointer dereference" Nov 3 14:27:23 localhost kernel: [ 3403.770823] BUG: unable to handle kernel NULL pointer dereference at 00000010 Nov 2 05:42:19 localhost kernel: [ 1401.099263] BUG: unable to handle kernel NULL pointer dereference at 00000010 Nov 1 02:07:47 localhost kernel: [16766.982905] BUG: unable to handle kernel NULL pointer dereference at 00000010

edit: seems its only the last few days that i've got 1 per day... don't think i've changed anything though, may just be luck: root@benaps3:~# (gunzip -c /var/log/kern*.gz; cat /var/log/kern.log /var/log/kern.log.?) | grep BUG Oct 14 10:54:29 localhost kernel: [ 2804.299413] BUG: unable to handle kernel NULL pointer dereference at 00000010 Oct 15 04:40:18 localhost kernel: [23087.575678] BUG: unable to handle kernel NULL pointer dereference at 00000010 Oct 15 05:15:19 localhost kernel: [ 2048.140138] BUG: unable to handle kernel NULL pointer dereference at 00000010 Oct 31 03:45:19 localhost kernel: [82778.592054] BUG: unable to handle kernel NULL pointer dereference at 00000010 Oct 31 04:22:20 localhost kernel: [ 2172.006313] BUG: unable to handle kernel NULL pointer dereference at 00000010 Nov 1 02:07:47 localhost kernel: [16766.982905] BUG: unable to handle kernel NULL pointer dereference at 00000010 Nov 2 05:42:19 localhost kernel: [ 1401.099263] BUG: unable to handle kernel NULL pointer dereference at 00000010 Nov 3 14:27:23 localhost kernel: [ 3403.770823] BUG: unable to handle kernel NULL pointer dereference at 00000010

tynbendad commented 5 years ago

@PieterGit as an fyi i once tried using an external usb to power the rig and it seemed to drain a 10000mA charger as quickly as a 2500mA lipo. Probably something inefficient in the usb/charging circuit. I have since gone with 2600mA to 3400mA 18650's on our rigs instead.

RadEnder commented 5 years ago

Was this fix ever added to Oref0? I’m on the latest dev branch but seem to be experiencing a similar problem. A reboot gets me up and running again but before reboot sometimes I cant SSH although still looping and other times no looping and no SSH access. I haven’t been successful in setting up offline BT but have tried multiple times and hotspot is enabled.

sulkaharo commented 5 years ago

@RadEnder Do you what the cause of your hangs is? We're still on the 0.6 series on Edison and the rig hasn't required a reboot in a very long time. Looking at the earlier issues, it looks to me like most times when the rig became unavailable, this was caused by the networking equipment the rig was connected to - now that the rig is only connecting to devices that don't misbehave with the rig, it's rock solid.

danamlewis commented 5 years ago

Lots of bluetooth improvements added in 0.7.0 (now release), and haven't heard this issue reported in a long time, including in release testing, so closing. Please open a new issue for any bluetooth-related issues that occur in 0.7.0/beyond.

Drainer2002 commented 1 year ago

My rig was completely working fine for years already. today morning, I rebooted the rig and the light in-between the port turned on for couple of minutes and then turned off. every time I rebooted it , it did the same thing. I connected the rig to my computer and it hangs at starting kernel.....

Please help

`** PSH KERNEL VERSION: b0182929 WR: 20104000

SCU IPC: 0x800000d0 0xfffce92c

PSH miaHOB version: TNG.B0.VVBD.0000000c

microkernel built 10:14:01 Jan 13 2015

PSH loader PCM page cache size = 192 KB Cache Constraint = 0 Pages Arming IPC driver .. Adding page store pool .. PagestoreAddr(IMR Start Address) = 0x04899000 pageStoreSize(IMR Size) = 0x00080000

Ready to receive application * U-Boot 2014.04 (Feb 09 2015 - 15:40:31)

Watchdog enabled DRAM: 980.6 MiB MMC: tangier_sdhci: 0 In: serial Out: serial Err: serial Hit any key to stop autoboot: 0 Target:blank Partitioning already done... Flashing already done... GADGET DRIVER: usb_dnl_dfu reading vmlinuz 4841888 bytes read in 155 ms (29.8 MiB/s) Valid Boot Flag Setup Size = 0x00003c00 Magic signature found Using boot protocol version 2.0c Linux kernel version 3.10.98-jubilinux-edison (robin@robin-i7) https://github.com/openaps/oref0/pull/13 SMP PREEMPT Fri Jun 23 02:24:12 EDT 2017 Building boot_params at 0x00090000 Loading bzImage at address 00100000 (4826528 bytes) Magic signature found Kernel command line: "rootwait root=PARTUUID=012b3303-34ac-284d-99b4-34e03a2335f4 rootfstype=ext4 console=ttyMFD2 earlyprintk=ttyMFD2,keep loglevel=4 g_multi.ethernet_config=rndis systemd.unit=multi-user.target hardware_id=00 g_multi.iSerialNumber=7beb42dc3a904d201d984157e4c3e600 g_multi.dev_addr=02:00:86:c3:e6:00"

Starting kernel ...`

Drainer2002 commented 1 year ago

arget:blank Partitioning already done... Flashing already done... GADGET DRIVER: usb_dnl_dfu reading vmlinuz 4841888 bytes read in 159 ms (29 MiB/s) Valid Boot Flag Setup Size = 0x00003c00 Magic signature found Using boot protocol version 2.0c Linux kernel version 3.10.98-jubilinux-edison (robin@robin-i7) #13 SMP PREEMPT Fri Jun 23 02:24:12 EDT 2017 Building boot_params at 0x00090000 Loading bzImage at address 00100000 (4826528 bytes) Magic signature found Kernel command line: "rootwait root=PARTUUID=012b3303-34ac-284d-99b4-34e03a2335f4 rootfstype=ext4 console=ttyMFD2 earlyprintk=ttyMFD2,keep loglevel=4 g_multi.ethernet_config=rndis systemd.unit=multi-user.target hardware_id=00 g_multi.iSerialNumber=7beb42dc3a904d201d984157e4c3e600 g_multi.dev_addr=02:00:86:c3:e6:00"

Starting kernel ...