info-beamer / os-config

info-beamer OS configuration tool
https://info-beamer.github.io/os-config/
3 stars 1 forks source link

Info-beamer hangs when network present, but broken #1

Open zontarian opened 6 years ago

zontarian commented 6 years ago

Hello, I've just come back from a customer's site where we have installed Info-Beamer hosted. Network has been always erratic, at best, but since two days ago the Raspberry was stuck on the splash panel (the "branding.jpg" image in the /INFOBEAMER/config dir.

Upon closer inspection a message waiting for content was displayed endlessly.. forever.

I've connected my laptop to the ethernet cable that wen to the raspberry, and I've discovered that the network is up but there is a captive portal asking for username/pwd. I know that info-beamer cannot account for every captive portal, so there is no support to this. And this is fine. ( https://info-beamer.com/doc/device-configuration#wificonfiguration -> captive portals)

I ended up unplugging the ethernet from the Raspberry, rebooting, and the raspberry correctly displayed the last configuration setup it had on board. I know it's not a perfect solution, but you can see that is better than having a monitor forever displaying a welcome screen.

What I am asking is if there is a config to put a timeout on the network requests.. after a while the raspberry should understand that even if the link is up, it won't be able to communicate with the info-beamer server and so should abandon using the net and start up net-less. Maybe it should poll the net every X minutes to decide if something changes, also.

If there is no such configuration (and I don't think there is) may I ask this as a feature request? Thank you Z

dividuum commented 6 years ago

How long has this captive portal environment going on? Normally after starting, the syncing process initiates a verify_only sync: It uses the last stored sync file without probing the servers for a new state file and applies it. So even if the device is offline or unable to use the network, it will apply the last active setup. All with one limitation: The sync file expires after 7 days. So if the Pi boots, gets an NTP time and the sync file is expired according to this time, the last setup won't be installed and you're stuck in "waiting for content".

When you start completely without network, the Pi will be in 1970 and the sync file isn't expired, but the Pi will reboot every few hours trying to get back online.

Allowing a device to show the configured content forever by simply disconnecting the network connection isn't compatible with the pricing or service model: After all info-beamer hosted is a "mostly online" service and a 7 days grace period seems fair. The hope is that in those 7 days, someone takes a closer look at the device to see if it actually still exists.

It might make sense to have an easier way to be notified of offline devices that are supposed to be online though. I can see that being a feature in the near future. Would that help?

zontarian commented 6 years ago

Hello. AS far as I know, the captive portal (or something with the same result) has been on for 4 days at least. And in this period, the Raspberry was always blocked on the branding screen.

As for the last setup I think it was older than 7 days.. so maybe this was the case. I don't know if the device rebooted since I am not in front of it all the time, neither my customers, but it may be so.

AS for the paid service, I am well aware, and mine was not a way to avoid paying the due. It's just that this device is two hours by train from my place, so going back and forth for me is not feasisble. It was the only way for me to have it working without knowing really what to do . We can PM and find another solution for paying you wholesale. We have opened a ticket with their IT asking them to make the Raspberry (whose MAC I gave them) to go online without blocks, and I hope soon to make them reconnect the ethernet cable and be able to monitor it from remote.

AS for the solution you write, yes it would help. With devices in far away places, problems with network happen unmonitored, so a way to know that the device is offline other than looking at the page would be great. Another feature could be a (custom) message instead of the "waiting for content", maybe a whole JPG or a full screen text "network problem. Please seek assistance and call xxx.."?

Thanks for your reply.

dividuum commented 6 years ago

Regardless of whether you change a setup or not, the hosted service will always create an updated version of the sync file every night (around 4:20 UTC at the moment). This file is either directly pushed to each device or, if the websocket isn't available, the device will poll the file eventually. If a device is online at any point, it will get the new file eventually and the 7 days start new. So that doesn't really explain what you've been seeing.

Another possibility would be that the device received a new sync file but not all content referenced in that sync file is cached locally yet. This might happen if the device rebooted, got power cycled or lost connectivity after a new sync file was sent and before all referenced content was downloaded. In that case, the device has no way to create a consistent state for info-beamer to play and it has to wait in the loading screen until this is fixed eventually by fetching the missing files.

As for monitoring: If you want something right now, you can already use the device list API endpoint and check the is_online flag in each returned device object.

zontarian commented 6 years ago

What I've experienced is this: our customers installed the device on the 19th of October. As you could see from the billing on my account, it has not been on line continuatively ever since. We have monitored it constantly, since it's far away, from the web console and it went on and off the net randomly. Honestly I don't know why, the net is managed by the IT office of a big retail chain in Northern Italy, and sometimes I think someone also disconnected inadvertitely the cable.. (regarding correct billing please PM me, I'd like a way to settle it). Maybe it was accessible for a day or two, then it went off the net, then it came up again.. since 6 days ago when things changed. Maybe this can account for the bizzarre behaviour?

The content was synced on the device correctly since the first day, so I don't think this is the case.

As for the API, yes we are aware now. Thanks

dividuum commented 6 years ago

Unfortunately there are no detailed logs going back that far, so I can't tell for sure what happened. The transactions page isn't helpful either because there is only a single device and as such there are no costs and no transactions have been generated. I can only tell you that in the last 7 days, the device has only been online yesterday (Oct 9th). I'm not sure what happened before Oct, 4th.

What might also have happened:

All while the websocket connection occasionally worked, due to being routed to a different, unblocked server. The websocket connection is optionally used to push a new sync file to a device directly, but if no setup change happened while the device was reachable using the websocket, the device relies on retrieving a new sync file though polling the sync servers. All that is speculation and I can't confirm this as there are no logs all the way back.