orcasound / aifororcas-livesystem

Real-time AI-assisted killer whale notification system (model and moderator portal) :star:
http://orcahello.ai4orcas.net/
MIT License
35 stars 21 forks source link

Add heartbeat/monitoring dashboard for inference system #88

Open micya opened 1 year ago

micya commented 1 year ago

Historically, troubleshooting for inference system/notification system failures involved manual steps to identify failures. Past hackathon focused on utilizing Azure Dashboards to surface some metrics from Log Analytics. However, Azure Dashboards is difficult for non-technical observers to use.

I'd like to look into setting up something separate from Azure for monitoring purposes. It can either be a self-developed application or an existing monitoring solution (prometheus?). It should show at minimum:

micya commented 1 year ago

Since we need to ultimately monitor across a range of different platforms, we will need a push-based system (as opposed to pull/scraped system like raw Prometheus).

scottveirs commented 1 year ago

Hey @micya, noticed the Canadian Integrated Ocean Observing System is has an uptime monitor that is based on open source code https://github.com/upptime/upptime. It might not be able to help with the instances, but could help ensure we know when any of these sites are not available:

scottveirs commented 1 year ago

Hey @micya -- Just noting a couple recent thoughts on possible tools, integrations, and/or data sources for an over-arching dashboard (i.e. maybe for not only the Azure-based realtime inference system, but the whole emerging ecosystem of Orcasound apps, APIs, and data layers):

scottveirs commented 1 year ago
  • Line chart for Cosmos DB read/write metrics

A sub-feature of a CosmoDB read line chart that I would find interesting:

Number of API requests from "outsiders" -- a possible metric for measuring the value of our open labeled to external collaborators, e.g. ML developers or bioacousticians.

Rachel-Frazier commented 10 months ago

We (@xilin22 and I) looked into setting up Prometheus and Grafana for a health dashboard, but determined Grafana doesn't allow individuals with personal accounts to access the Grafana dashboard without having a work or school account. (See following error:) image There's a feedback request for this feature, but it doesn't seem as though the Grafana team is looking to implement this any time soon.

We are now looking into using Azure Workbooks for data visualization instead, which is newer and may solve some of the pain points that were called out in 2022.

xilin22 commented 10 months ago

As for the alerting, we can add more azure functions to monitor service and resource health. Since Azure Managed Grafana does not allow personal accounts to login into Azure Managed Grafana instance

xilin22 commented 10 months ago

@micya @scottveirs We may be able to get Azure Managed Grafana to work if we create our own organizational domain. It might be worth a shot if there is little to no cost in creating one. Maybe then Azure won't view it as personal account. image

micya commented 10 months ago

@micya @scottveirs We may be able to get Azure Managed Grafana to work if we create our own organizational domain. It might be worth a shot if there is little to no cost in creating one. Maybe then Azure won't view it as personal account. image

We already have an organization. If you create a user in our AAD tenant, that should work. Though we would then need to track the username/password for the new user.

xilin22 commented 10 months ago

That makes sense. I dont have permissions to create one. Maybe either you @micya and @scottveirs can create one and send me the credentials? image

micya commented 10 months ago

@xilin22 - granted "User Administrator" on AAD tenant. Let me know if that doesn't work.