valeriansaliou / vigil

🚦 Microservices Status Page. Monitors a distributed infrastructure and sends alerts (Slack, SMS, etc.).
https://crates.io/crates/vigil-server
Mozilla Public License 2.0
1.7k stars 125 forks source link

Add ability to check probe output for arbitrary string value #1

Closed moritzheiber closed 6 years ago

moritzheiber commented 6 years ago

One of the greatest features I see implemented by comparable services is the ability to check the output of the probe sent to an HTTP endpoint for arbitrary strings instead of just HTTP response codes. I would love have this feature implemented for Vigil.

valeriansaliou commented 6 years ago

Hi! Adding this now, this is definitely I know of being useful (we just don't use it for Crisp).

The response check will be an exact content check, eg. if you match "OK" and the server replies "200 OK" it won't match. Hope that will do for you.

By the way, may I ask how did you know about Vigil? It's quite recent (was fully released yesterday) so I'm quite surprised to see such an early feedback 😄

valeriansaliou commented 6 years ago

Also, before I start coding this, do you see this useful as a global configuration common to all HTTP poll probes, or rather a per-node configuration option?

moritzheiber commented 6 years ago

Regarding your first question (re: matching), I think it would be useful to support at least two different types of matches, or even a regular expression matching. For example, I have a service returning a JSON response with a couple of service information in there. Now, I could match on all of them being "Okay" or I could match several different conditions. The latter appears to be more useful to me.

I love the way Hashicorp structures their health endpoints, and I'd be glad to be able to make use of the data is provides, i.e. differentiating between "the cluster isn't reachable" and "I'm sealed" or "I'm uninitialized" using the example in the Vault API docs.

Second question (re: Vigil quite recent): I regularly scour GitHub for interesting repositories which could help me with my job, and I actually thought about writing something similar in Rust myself. I'm glad I found Vigil :smile:

Third question (re: all/node configuration): IMHO, it should be a per-node configuration option, as most likely, different HTTP nodes are going to respond to requests differently (at least that's the model I'm used to).

valeriansaliou commented 6 years ago

Got that. That makes sense, yes. I'm going for the Regex version as this one definitely provides more flexibility.

Didn't know you did Rust also, I'm glad I spared you a few days of work 😄

As this is a feature you need on your end, and thus you know the edgy details, you may PR Vigil on this. Otherwise I can do it, but it might not be 100% tuned to what you need / what other people need (I'm not familiar with eg. Hashicorp though I knew them by name). Let me know what's best!

moritzheiber commented 6 years ago

I won't have the time for it right now, as I just started managing a new project internally. I'd be happy to send PRs later should I need any additional functionality.

valeriansaliou commented 6 years ago

Here you go: 65e1b1c794357d85f38b937c4832a1b965b7fa34

Let me know if that works for you. When the http_body_healthy_match option is configured for a poll node in HTTP mode, the regex will be used to check for health if and only if the status code check passed.

valeriansaliou commented 6 years ago

Also, I didn't release a version for this, waiting for feedback. So pull master and compile it on your side 🎉

moritzheiber commented 6 years ago

Sorry, I was too busy yesterday, I'll take a look at it today!

moritzheiber commented 6 years ago

Short feedback: Either I'm unable to correctly use the functionality as the documentation suggests or it doesn't work. Here's the configuration snippet I tried:

[...]
[[probe.service.node]]

id = "Test"
label = "Test"
mode = "poll"
replicas = ["https://www.ping.eu"]
http_body_healthy_match = ".*\"status\":\"blibber\".*"

Trying to match some JSON here

$ curl -L https://www.ping.eu | grep blibber
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12295    0 12295    0     0  12295      0 --:--:-- --:--:-- --:--:--  141k
$

The debug log:

$ vigil -c config.cfg 
(INFO) - starting up
(DEBUG) - prober store: got service web
(DEBUG) - prober store: got node web:Test
(DEBUG) - prober store: got replica web:Test:https://www.ping.eu
(INFO) - initialized prober store
(DEBUG) - spawn managed thread: responder
(DEBUG) - spawn managed thread: aggregator
(INFO) - 🔧  Configured for production.
(INFO) - address: 0.0.0.0
(INFO) - port: 8080
(INFO) - log: critical
(INFO) - workers: 4
(INFO) - secret key: generated
(INFO) - limits: forms = 32KiB
(INFO) - tls: disabled
(WARN) - environment is 'production', but no `secret_key` is configured
(DEBUG) - spawn managed thread: prober
(DEBUG) - running an aggregate operation...
(DEBUG) - aggregate probe: web
(INFO) - [extra] template_dir: "./res/assets/./templates"
(DEBUG) - running a probe operation...
(DEBUG) - aggregate node: web:Test
(DEBUG) - aggregated status for replica: web:Test:https://www.ping.eu => Healthy
(DEBUG) - aggregated status for node: web:Test => Healthy
(DEBUG) - aggregated status for probe: web => Healthy
(INFO) - ran aggregate operation (notified: false)
(INFO) - 🛰  Mounting '/':
(DEBUG) - will probe replica: HTTPS("https://www.ping.eu/") with retry count: 1
(INFO) - GET /
(INFO) - POST /reporter/<probe_id>/<node_id> application/json
(INFO) - GET /robots.txt
(INFO) - GET /badge/<kind>
(INFO) - GET /assets/fonts/<file..>
(INFO) - GET /assets/images/<file..>
(INFO) - GET /assets/stylesheets/<file..>
(ERROR) - 🚀  Rocket has launched from http://0.0.0.0:8080
(DEBUG) - threads = 4
(DEBUG) - prober poll will fire for http target: https://www.ping.eu/?1516825618
(DEBUG) - loop poll - Duration { secs: 0, nanos: 6948 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 288289089 }
(DEBUG) - consuming notification queue
(DEBUG) - loop process - 2 events, Duration { secs: 0, nanos: 74145 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 1451 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 288389255 }
(DEBUG) - loop process - 0 events, Duration { secs: 0, nanos: 14223 }
(DEBUG) - resolving host="www.ping.eu", port=443
(DEBUG) - loop poll - Duration { secs: 0, nanos: 240185359 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 528595814 }
(DEBUG) - connecting to 88.198.46.60:443
(DEBUG) - adding a new I/O source
(DEBUG) - scheduling direction for: 0
(DEBUG) - blocking
(DEBUG) - loop process - 1 events, Duration { secs: 0, nanos: 175897 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 1687 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 528781315 }
(DEBUG) - loop process - 0 events, Duration { secs: 0, nanos: 14774 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 12246940 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 541049805 }
(DEBUG) - notifying a task handle
(DEBUG) - loop process - 1 events, Duration { secs: 0, nanos: 49209 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 10510 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 541116542 }
(DEBUG) - scheduling direction for: 0
(DEBUG) - blocking
(DEBUG) - scheduling direction for: 0
(DEBUG) - blocking
(DEBUG) - loop process - 1 events, Duration { secs: 0, nanos: 337896 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 1897 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 541467956 }
(DEBUG) - loop process - 0 events, Duration { secs: 0, nanos: 16506 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 17082591 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 558575835 }
(DEBUG) - notifying a task handle
(DEBUG) - loop process - 1 events, Duration { secs: 0, nanos: 117994 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 7420 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 558728341 }
(DEBUG) - scheduling direction for: 0
(DEBUG) - blocking
(DEBUG) - loop process - 1 events, Duration { secs: 0, nanos: 2141955 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 11151 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 561141937 }
(DEBUG) - loop process - 0 events, Duration { secs: 0, nanos: 385756 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 18178047 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 579822400 }
(DEBUG) - notifying a task handle
(DEBUG) - loop process - 1 events, Duration { secs: 0, nanos: 51545 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 6761 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 579890640 }
(DEBUG) - loop process - 1 events, Duration { secs: 0, nanos: 215515 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 1738 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 580117793 }
(DEBUG) - flushed 111 bytes
(DEBUG) - scheduling direction for: 0
(DEBUG) - blocking
(DEBUG) - loop process - 1 events, Duration { secs: 0, nanos: 131926 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 1631 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 580259435 }
(DEBUG) - loop process - 1 events, Duration { secs: 0, nanos: 15243 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 1089 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 580282968 }
(DEBUG) - loop process - 0 events, Duration { secs: 0, nanos: 13774 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 14213772 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 594518178 }
(DEBUG) - notifying a task handle
(DEBUG) - loop process - 1 events, Duration { secs: 0, nanos: 58797 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 8769 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 594592973 }
(DEBUG) - read 418 bytes
(DEBUG) - parsed 8 headers (418 bytes)
(DEBUG) - incoming body is content-length (0 bytes)
(DEBUG) - loop process - 2 events, Duration { secs: 0, nanos: 365164 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 2834 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 594980020 }
(DEBUG) - Response: '200 OK' for https://www.ping.eu/?1516825618
(DEBUG) - consuming notification queue
(DEBUG) - dropping I/O source: 0
(DEBUG) - loop process - 2 events, Duration { secs: 0, nanos: 73139 }
(DEBUG) - loop poll - Duration { secs: 0, nanos: 1660 }
(DEBUG) - loop time - Instant { tv_sec: 66171, tv_nsec: 595062100 }
(DEBUG) - loop process - 0 events, Duration { secs: 0, nanos: 28137 }
(DEBUG) - prober poll result received for url: https://www.ping.eu/?1516825618 with status: 200
(DEBUG) - replica probe result: web:Test:https://www.ping.eu => Healthy
(INFO) - ran probe operation

It doesn't seem to me the regexp is used at all? Or is it a hidden error?

Note: I find it interesting that "The rocket has launched from" is classified as ERROR

PS: The configuration parser really doesn't like missing parts (like [notify] or [plugins]) in the configuration file

valeriansaliou commented 6 years ago

Mhh, strange. Worked fine for me. I guess the response cannot be unpacked, which is why it stays silent.

I've added more log points, can you pull master, compile and test again? (added on 9c8aa828171c56b22252a32816372145601f0c8b).

Also, I've addressed your 'PS' to make [notify] and [plugins] optional, as they certainly are not used for some uses (in 6ba5b57430c8a56b79f58d184cd6d4b4bae57fd9).

I'll release a new version (with Debian 8 builds) when everything is all good for your use case.

moritzheiber commented 6 years ago

I took a look at my checkout again and it appears I had the wrong tree checked out. With HEAD everything is working as expected, I thoroughly apologize. :+1:

Optional configuration parameters are working as well, I'd consider this solved! :tada:

valeriansaliou commented 6 years ago

All right, issuing the release 👍

valeriansaliou commented 6 years ago

Done, v1.2.0 is out!

valeriansaliou commented 6 years ago

FYI on your remark on Rocket's startup log visible in Vigil logs, I've issued a report on SergioBenitez/Rocket/issues/553.