ipfs-inactive / jenkins

[ARCHIVED] Configuration for IPFS's build system

https://ci.ipfs.team/blue

8 stars 6 forks source link

Setting up monitoring/alerts #2

Closed victorb closed 6 years ago

victorb commented 7 years ago

Acceptance Criteria

[x] Can view jenkins monitoring dashboards
[x] Can deploy changes to monitoring by reading the documentation
[x] Will get alerts if certain metrics goes under the threshold

Tasks

[x] checkout Prometheus integration for jenkins and apply if possible
[x] Connect to existing Prometheus deployment
[x] Setup dashboard
[x] Setup alerts

Dependencies

Depends on #1

ghost commented 7 years ago

Remember what this means?

Can check out the monitoring dashboards

Can view the monitoring dashboard? Or: can run the dashboard locally, in an automated way?

victorb commented 7 years ago

@lgierth my intention was "can view the monitoring dashboard".

Thinking about it now, we should probably have two tasks, one for being able to run monitoring locally and one for having it working in production. What you think?

ghost commented 7 years ago

Yeah let that running-locally be separate: ipfs/infrastructure#52

victorb commented 7 years ago

Ok, in that case, this task would depend on us running jenkins deployed somewhere before you can start working on this task. Correct?

ghost commented 7 years ago

Nah I can get started with the local jenkins

ghost commented 7 years ago

11 enables the scraping endpoint for prometheus, and the dashboard is here: http://metrics.ipfs.team/dashboard/db/jenkins?from=now-1h&to=now -- I simply imported the one from grafana.net and I'm not very convinced by it.

The dashboard will need tuning when there's actual jobs to monitor :) And that's also when we can start setting alert conditions.

Prometheus is currently set to scrape [fce3:5702:8051:3e65:3a36:1299:c458:1470]:8090/prometheus and we can change that to what comes out of #8.

ghost commented 7 years ago

I'm wrapping this up with ipfs/infrastructure#235 which makes all provsn units systemd-compatible, so that we now also get host metrics (cpu, ram, io).

The dashboard should start showing numbers tomorrow when @VictorBjelkholm brings jenkins back up (I broke it). Over the rest of the sprint we'll tune the dashboard and add alerts as we see fit.

ghost commented 7 years ago

Splitting off tuning and alerting to #31.

victorb commented 6 years ago

Moved infrastructure for jenkins and monitoring is not setup yet. Reopening this in the meantime.

victorb commented 6 years ago

Yay, jenkins monitoring dashboard is back online!