openanalytics / shinyproxy

ShinyProxy - Open Source Enterprise Deployment for Shiny and data science apps
https://www.shinyproxy.io
Apache License 2.0
520 stars 151 forks source link

Uptime/Failure Monitoring #517

Open kramerrs opened 1 month ago

kramerrs commented 1 month ago

A consequence of preinitialized containers, is that it is possible to get a sense when an app starts to fail. Often times this is data related, some entry in a database is outside the scope of the developer's expectation, and it cause some visualization to fail. With preinitialized containers it would be nice to get some sort of warning email when an application is repeatedly failing.

LEDfan commented 1 week ago

We currently don't have plans to implement the email warning, although I see a few alternatives. There is already a metric for app failures, but this does not include pre-initiliazed containers that failed to start. Therefore, we could create a new metric that exposes the number of failed pre-initiliazed containers. Using prometheus and alertmanager you can then alert on these metrics (https://shinyproxy.io/documentation/usage-statistics/).

For the next release, we plan to further integrate the pre-initialization feature into the admin dashboard. We could include here if the containers are failing to start. This data would then also be exposed in the admin api (https://shinyproxy.io/downloads/swagger/?urls.primaryName=ShinyProxy%203.1.1#/ShinyProxy/adminData) and again this could be used to report on this.

I'll keep this open as a enhancement request for the email report.

kramerrs commented 1 week ago

I was able to get some traction on this. It's possible to spin up a docker container monitor for ShinyProxy from a lightweight alpine container and send emails messages from it. I am thinking about how best to identify a repeated failure, as opposed to a one off. For example, I could monitor and use regular expressions to test for the delegate failure in the logs. I think I can put the whole thing in the shinyproxy compose yml.

kramerrs commented 23 hours ago

I tried to monitor the logs for "Delegate Failed" messages. This works, however it wasn't a reliable metric for monitoring the app. I tried setting up an app that failed during setup, and it didn't send this message. It tried to connect then sent a 410 response. However, when I tried to connect with a browser, it generated the "Delegate Failed" message. Seems like I have seen the "Delegate Failed" response other times. Just wondering if there is any way to reliably detect when the app fails during load. Is this a Docker service thing? Should I look in the docker logs to see when the service needs to be relaunched?