highvolt-dev / tmo-monitor

A lightweight, cross-platform Python 3 script that can monitor the T-Mobile Home Internet Nokia, Arcadyan, and Sagecom 5G Gateways for 4G/5G bands, cellular site (tower), and internet connectivity and reboots as needed or on-demand.
MIT License
89 stars 15 forks source link

could not query site and could not post login #41

Open danhausman opened 2 years ago

danhausman commented 2 years ago

I am running the latest code. This morning, my internet dropped out. When I looked at logs I got the following:

2022/01/11 07:00:04 [INFO] 4G: B2 | 5G: n41 | eNB ID: 28646 | Avg Ping: 44 ms | Uptime: 158930 sec 2022/01/11 07:05:01 [CRITICAL] Could not post login request, exiting. 2022/01/11 07:05:01 [CRITICAL] Could not query site info, exiting. 2022/01/11 07:06:09 [ERROR] Could not ping google.com. 2022/01/11 07:06:09 [CRITICAL] Could not query modem uptime, exiting. 2022/01/11 07:06:17 [CRITICAL] Could not post login request, exiting. 2022/01/11 07:06:17 [CRITICAL] Could not query site info, exiting. 2022/01/11 07:06:43 [CRITICAL] Could not post login request, exiting. 2022/01/11 07:06:43 [CRITICAL] Could not query site info, exiting.

Any ideas why? I was able to log into my gateway from my laptop, so it was still on my local network. As soon as the reboot was complete, everything was fine and things were being logged like normal again.

2022/01/11 07:10:03 [INFO] 4G: B66 | 5G: n41 | eNB ID: 28646 | Avg Ping: 59 ms | Uptime: 94 sec 2022/01/11 07:12:46 [INFO] 4G: B66 | 5G: n41 | eNB ID: 28646 | Avg Ping: 84 ms | Uptime: 257 sec

highvolt-dev commented 2 years ago

@danhausman sounds like your Raspberry Pi didn't have network connectivity on the LAN with the gateway.

Those error messages are triggered when API calls to 192.168.12.1 fail (with the exception of the could not ping google.com line, of course).

Sounds like it was still in fact up and responsive since you could use your laptop, and once your gateway rebooted, the Raspberry Pi was able to reestablish a connection.

Edit: Actually...it looks like you have your checks set to run every minute? It probably was not done rebooting when it tried running again I would guess.

danhausman commented 2 years ago

Not sure what was up. My laptop could see the gateway, and there was not internet on the laptop. The laptop could ssh to the pi. I did not see if the pi could get to the gateway though. Not sure why it would not be able to see the gateway.

highvolt-dev commented 2 years ago

@danhausman how is your network setup? do you have another router setup in AP mode?

danhausman commented 2 years ago

@highvolt-dev Gateway has ethernet connection to Eero. Eero is doing DHCP behind the gateway. Everything else connects to Eero Aps via wireless or a switch that is connected to the main Eero gateway.

I know, not ideal, but if you don't let Eero do DHCP and DNS, then alot of the kid safe functionality does not work. :( I will do more digging if I see it happen again and see if the raspberry pi can actually talk to the gateway.

highvolt-dev commented 2 years ago

@danhausman I would suspect this is caused by your 1 minute check interval and the fact that it can take longer than 1 minute for it to reboot, for its APIs to come back online, and for it to establish cellular connectivity.

highvolt-dev commented 2 years ago

The uptime check actually happens late in execution so if the first part of the script can't get data because the router just got rebooted, you get the sort of behavior you described.

danhausman commented 2 years ago

I saw something similar happen again this morning. I have it set to run every 5 minutes on a cron job. This is how it showed in the logs. Strange part was internet was definitely down, It could not ping. I checked and it was not just the raspberry pi that could not talk outside the network, nothing else could. Even though I could not ping out, I did not get a reboot, you can see that by the uptime.

The other strange part is it is jumping from B2 to B66 during this. I had to disable reboot on switching to B66 yesterday because I was getting kicked off phone calls.

2022/01/14 06:00:04 [INFO] 4G: B2 | 5G: n41 | eNB ID: 28646 | Avg Ping: 65 ms | Uptime: 65673 sec 2022/01/14 06:05:03 [INFO] 4G: B2 | 5G: n41 | eNB ID: 28646 | Avg Ping: 76 ms | Uptime: 65973 sec 2022/01/14 06:10:03 [INFO] 4G: B2 | 5G: n41 | eNB ID: 28646 | Avg Ping: 74 ms | Uptime: 66272 sec 2022/01/14 06:15:04 [INFO] 4G: B2 | 5G: n41 | eNB ID: 28646 | Avg Ping: 51 ms | Uptime: 66573 sec 2022/01/14 06:20:02 [CRITICAL] Could not post login request, exiting. 2022/01/14 06:20:02 [CRITICAL] Could not query site info, exiting. 2022/01/14 06:25:24 [ERROR] Could not ping google.com. 2022/01/14 06:25:24 [CRITICAL] Could not query modem uptime, exiting. 2022/01/14 06:30:04 [INFO] 4G: B66 | 5G: n41 | eNB ID: 28646 | Avg Ping: 46 ms | Uptime: 67473 sec 2022/01/14 06:35:04 [INFO] 4G: B66 | 5G: n41 | eNB ID: 28646 | Avg Ping: 58 ms | Uptime: 67773 sec 2022/01/14 06:40:03 [INFO] 4G: B66 | 5G: n41 | eNB ID: 28646 | Avg Ping: 61 ms | Uptime: 68073 sec

highvolt-dev commented 2 years ago

It looks like you had a similar band change in your logs in your initial report too. It seems like it is temporarily unreachable at that time but is able to ping the next time it runs, avoiding a reboot. This seems tricky to work around - it is not clear whether or not you'd be able to at least ping the gateway during these periods of time or if the gateway is entirely unreachable over your network during these times.

There is a report of a new firmware version coming out - it may be that the firmware update drops before we troubleshoot the underlying cause in a way that we can improve the handling.