valeriansaliou / vigil

๐Ÿšฆ Microservices Status Page. Monitors a distributed infrastructure and sends alerts (Slack, SMS, etc.).
https://crates.io/crates/vigil-server
Mozilla Public License 2.0
1.73k stars 128 forks source link

[POLL] What should I build next for Vigil? #36

Open valeriansaliou opened 5 years ago

valeriansaliou commented 5 years ago

Hello!

If anyone has quick feature suggestions or lacks a feature in Vigil, please use this thread to submit your ideas.

pgaskin commented 5 years ago

It would be nice if you could see the last time there was a problem for a check and how long it lasted.

aguilaair commented 5 years ago

It oowuld be amazing if you could modernize the status page + add a GUI backend.

valeriansaliou commented 5 years ago

@aguilaair modernize? Can you explain what's wrong w/ the current one?

aguilaair commented 5 years ago

There's nothing wrong with it but an even more minimal design would be awesome.

FlorentinDUBOIS commented 5 years ago

Hello @valeriansaliou,

Is this thread active, yet?

I am currently using vigil and I found it awesome. To enhance it, there are some ideas which came to me when I was deploying vigil.

I think that we could add a field "kind" to a probe which is an enum. This allows us to add a new type of check like a heartbeat. This probe will allow us to monitor system which could not be handled by TCP or HTTP checks.

Besides, It will be great to allow to specify the expected status when setting an HTTP check. For example, this allows monitoring API under a load balancer.

By the way, I will be glad to help you with this.

valeriansaliou commented 5 years ago

Hey @FlorentinDUBOIS ; glad to meet you there :)

Great idea. I accept PRs for this ๐Ÿ‘

On the expected HTTP status, it's already possible w/ poll_http_status_healthy_above and poll_http_status_healthy_below though those are global settings.

FlorentinDUBOIS commented 5 years ago

I do not see those settings, I will try it.

Thanks :D

jedineeper commented 4 years ago

~It would be great if it was possible to autodiscover/iterate over the available replicas for a particular service somehow?~

~For my specific usecase, I want to run Vigil inside a kubernetes cluster and monitor the available replicas. I think it could be generalised for a number of possible use cases where autodiscovery of the replicas based on the endpoint was possible. (Consul for another example, provides a service DNS name for several replicas.)~

~It's a bit outside Vigil's usecase but I love the style and interface you have for tracking replicas like this :)~

ignore me, it's already requested in #13 :)

SagnikSRHUSE commented 4 years ago

It would be great to have incident history reporting, unless I have missed it somewhere :stuck_out_tongue:

mzs114 commented 4 years ago

Many features have been listed by other commentators, I can think of dynamic configuration and Prometheus exporter? I really liked that Vigil supports monitoring services behind NAT/Firewall.

denisle1981 commented 4 years ago

It would be great to have possibility to monitor DB health (MySQL, MariaDB, Postgres). I know, it isn't that easy, I tried it, I've created a python script that runs a simple HTTP server, and depends on the URL request returns 200 if DB is ok (URL request looks like "http://ip/"), and Ive tried different methods to check the DB health or status and my scripts works when I run it and just call URL request manually from browser but its status dies few minutes after the app start when I add it to the vigil-status probes file. So, if you will find a way to work it out that will be awesome feature!

valeriansaliou commented 4 years ago

@denisle1981 At Crisp we're using the new Vigil script probe type to monitor DB health, connecting to the DB from the network and monitoring replication status.

Script probes were designed for all those specific monitoring use cases, which cannot be generalized due to being very specific (ie. backend-specific, I'll never add a custom Redis monitoring probe type, all those use cases should fallback to the script probe type, with a simple shell script).

And, if you cannot connect to MySQL from your Vigil server, you could use vigil-local (running on the MySQL server itself), which would execute the script locally and report any result to Vigil: https://github.com/valeriansaliou/vigil-local

An example script from our Vigil configuration:

[[probe.service.node]]

id = "mysql-replication"
label = "MySQL replication"
mode = "script"

scripts = [
  '''
  status=$(timeout 5 mysql --host="<target_mysql_slave_host>" --user="<target_user>" --password="<target_password>" --execute="SHOW SLAVE STATUS\G;")

  last_error=$(printf "$status" | grep "Last_Error" | cut -d':' -f2 | tr -dc '[:print:]' | sed 's/ //g')
  seconds_behind=$(printf "$status" | grep "Seconds_Behind_Master" | cut -d':' -f2 | tr -dc '[:print:]' | sed 's/ //g')

  if [ ! -z "$last_error" ]; then
    exit 2
  fi

  if [ -z "$seconds_behind" ]; then
    exit 2
  fi

  if [ "$seconds_behind" -lt "600" ]; then
    exit 0
  fi

  exit 1
  '''
]
denisle1981 commented 4 years ago

@denisle1981 At Crisp we're using the new Vigil script probe type to monitor DB health, connecting to the DB from the network and monitoring replication status.

Script probes were designed for all those specific monitoring use cases, which cannot be generalized due to being very specific (ie. backend-specific, I'll never add a custom Redis monitoring probe type, all those use cases should fallback to the script probe type, with a simple shell script).

And, if you cannot connect to MySQL from your Vigil server, you could use vigil-local (running on the MySQL server itself), which would execute the script locally and report any result to Vigil: https://github.com/valeriansaliou/vigil-local

An example script from our Vigil configuration:

[[probe.service.node]]

id = "mysql-replication"
label = "MySQL replication"
mode = "script"

scripts = [
  '''
  status=$(timeout 5 mysql --host="<target_mysql_slave_host>" --user="<target_user>" --password="<target_password>" --execute="SHOW SLAVE STATUS\G;")

  last_error=$(printf "$status" | grep "Last_Error" | cut -d':' -f2 | tr -dc '[:print:]' | sed 's/ //g')
  seconds_behind=$(printf "$status" | grep "Seconds_Behind_Master" | cut -d':' -f2 | tr -dc '[:print:]' | sed 's/ //g')

  if [ ! -z "$last_error" ]; then
    exit 2
  fi

  if [ -z "$seconds_behind" ]; then
    exit 2
  fi

  if [ "$seconds_behind" -lt "600" ]; then
    exit 0
  fi

  exit 1
  '''
]

Thanks @valeriansaliou , I'll try it.

zllovesuki commented 3 years ago

Would it be too difficult to support gotify for notification?

valeriansaliou commented 3 years ago

I could accept a PR on that. The code of this notifier should be quite similar to Pushover.

zllovesuki commented 3 years ago

Alright. Time to learn rust in 2 days ๐Ÿ˜‚

zllovesuki commented 3 years ago

@valeriansaliou alright, #65 is up. So far, rust seems pretty nice (coming from Go). Still needs some time for deep dive.

May I recommend switching to Github Actions? Travis CI seems pretty slow comparing to Github Actions.

valeriansaliou commented 3 years ago

@zllovesuki thanks for the PR, the merge is all good for me now. I'll consider switching to GH Actions, yes, as Travis is not giving away free CI minutes for OSS projects anymore.

dpeterka commented 1 year ago

not sure if this is still open but @valeriansaliou I would love to see support for webhooks e.g. we need to send an alert from another tool doing some monitoring but want to centralize via your tool.

valeriansaliou commented 1 year ago

If youโ€™re looking for a Web Hooks notifier, Vigil has one. You can also use the Reporter API to send your own statuses.

dpeterka commented 1 year ago

Hooks

I didn't see that in the docs that is awesome ty @valeriansaliou