harness / harness

Harness Open Source is an end-to-end developer platform with Source Control Management, CI/CD Pipelines, Hosted Developer Environments, and Artifact Registries.
https://www.harness.io/open-source
Apache License 2.0
32.3k stars 2.82k forks source link

Expose metrics #478

Closed aybabtme closed 7 years ago

aybabtme commented 10 years ago

It would be nice if droned would expose a couple metrics. I'm working on a patch for our own installation of drone and would like to know if you are interested in a contribution that would expose:

aybabtme commented 10 years ago

This is a draft I made:

https://github.com/aybabtme/drone/commit/695d8660c279cd7ebe42442a3d2de83caa435f0e

Produces things like:

$ curl -s http://127.0.0.1:8080/debug/vars | jq '.memstats = null'
{
  "cmdline": [
    "./droned"
  ],
  "memstats": null,
  "queue": {
    "queueLen": 0,
    "worker0": {
      "branch": "",
      "commit": "",
      "repo": ""
    },
    "worker1": {
      "branch": "",
      "commit": "",
      "repo": ""
    },
    "worker2": {
      "branch": "",
      "commit": "",
      "repo": ""
    },
    "worker3": {
      "branch": "",
      "commit": "",
      "repo": ""
    },
    "worker4": {
      "branch": "",
      "commit": "",
      "repo": ""
    },
    "worker5": {
      "branch": "",
      "commit": "",
      "repo": ""
    },
    "worker6": {
      "branch": "",
      "commit": "",
      "repo": ""
    },
    "worker7": {
      "branch": "",
      "commit": "",
      "repo": ""
    },
    "workerCount": 8,
    "workerWaiting": 8,
    "workerWorking": 0
  }
}

Let me know if that's something of interest.

bradrydzewski commented 10 years ago

very cool, I wasn't even aware of expvar!

I do have a new queue / worker implementation that will provide endpoints for workers, pending jobs, started jobs, etc. It is being used in a custom version of Drone we're building for Google: https://github.com/drone/drone-dart/blob/master/handler/worker.go

I'm definitely up for exposing debug data. I'm thinking it should be optional based on a set of command line flags (--debug for example).

Since you are digging around in this layer, I would be interested to get your thoughts on ways to cancel jobs in a queue ...

brettlangdon commented 10 years ago

my $0.02, having an endpoint to expose stats for things like memory usage, current builds, maybe repo stats as well like # failed and #success, queue length, etc would be great for integration with tools like http://datadoghq.com for useful monitoring

bradrydzewski commented 10 years ago

You can get the current and pending builds (queue length), and list of registered workers here: https://github.com/drone/drone/blob/exp/server/handler/worker.go

The endpoints are:

# list started builds
curl http://localhost:8000/api/work/started

# list pending builds
curl http://localhost:8000/api/work/pending

# list started builds + their assigned worker nodes
curl http://localhost:8000/api/work/assignments

# list all workers
curl http://localhost:8000/api/workers

You can get non-aggregated repository stats using the /api/repo/:host/:owner/:name/* endpoints. Hopefully you can do quite a bit with the API today, however, we'll definitely continue to improve.

mattbostock commented 9 years ago

I really like expvar, though it does have a downside in that I think you'd have to be careful in exposing it publicly, partly because it stops the world by invoking runtime.ReadMemStats().

I think it'd be great to see support for Statsd and Prometheus. Prometheus is supported by cAdvisor for example, which I think would be a good complement to Drone.

I was thinking we could model a new metrics package based on the plugin architecture for remotes. The metrics interface would be standardised but you could choose which plugin you wanted to use, e.g. Statsd.

The metrics package from GoKit might be useful in this: https://github.com/go-kit/kit/tree/master/metrics

Let me know what you think.

bradrydzewski commented 9 years ago

@mattbostock I love the idea of a metrics API using Drone's 0.4 plugin pattern. The default Drone binary would ship with a Noop implementation, but could be configured to use an external Prometheus plugin, for example.

Would this only report system metrics (CPU, RAM, etc) or could it be used to report broader statistics related to builds? For example, people want to track build duration over time for a repository. I would like to consider this use case as well.

mattbostock commented 9 years ago

By 'Drone's 0.4 plugin pattern', do you mean what's currently in master (e.g. 7d017545)?

I'd like to focus on build statistics (like build duration) rather than system metrics; I don't think that system metrics should be a concern of Drone (there are better ways of doing that independently).

mattbostock commented 9 years ago

Great to hear you'd be interested. I'll see what I can whip up.

bradrydzewski commented 9 years ago

@mattbostock this would need to be based on the 0.4 branch. I haven't fully flushed out the design yet, but the idea is that most layers of Drone can be completely swapped out.

Some example use cases:

We need a plugin system that let's people alter how Drone behaves without having to include all of this functionality in the core Drone repository.

In 0.4 Drone has the concept of builtin and plugin implementations. The builtin implementation is compiled directly into the Drone binary (ie sqlite database, github, bitbucket). The plugin implementation is a standalone server / daemon that Drone communicates with over some sort of RPC. You can see some of the concepts here: https://github.com/drone/drone/tree/0.4.0/pkg/queue

My thought is the default Drone binary would define a package for metrics. The builtin implementation would do nothing with those metrics. It would effectively be a no-op. The plugin implementation would define the protocol for sending metric data to a separate running service.

So if you wanted to send Drone metrics to Prometheus, for example, you would essentially run two servers. You would run the Drone server and an intermediate server that accepts RPC calls from Drone, and formats that data for use in Prometheus.

I probably need to write a blog post on the topic, but that is the overall direction we are going.

mattbostock commented 9 years ago

What if the metrics packages were to use the builtin plugin implementation?

I think using RPC for metrics would introduce a lot of unnecessary complexity. For example, when sending metrics to statsd, you use UDP so as to avoid blocking synchronous calls. In the case of Prometheus, metrics are pulled from an endpoint (polled), so it makes more sense IMHO to include them in the Drone binary and configure the metrics backend using the configuration file.

bradrydzewski commented 9 years ago

I could see it working like this:

drone --> daemon collecting metrics <-- polled by prometheus drone --> daemon collecting metrics --> pushed to statsd

The builtin package are reserved for functionality that is self-contained inside the Drone binary. The plugin packages are meant for third party services. Statsd and Prometheus would be implemented as plugins.

mattbostock commented 9 years ago

One of the most attractive things about Drone as it is currently is how simple it is to deploy. I can understand why you might want to keep certain functionality out of the Drone core binary, though I don't think that would make sense for metrics, which should be as lightweight as possible.

I'm not sure what would be gained by having Drone first send the metrics to an intermediate binary before they are forwarded on.

aybabtme commented 9 years ago

You could use the work done in gokit and offer prometheus/expvars on an http endpoint.

https://github.com/go-kit/kit/tree/master/metrics

From there, if someone uses statsd or push metrics, it's fairly easy to write a bridge that polls the http endpoint and pushes to statsd. Expvars are supported by things like Datadog and I suspect other metric services. Prometheus is very simple to run and is awesome.

metalmatze commented 8 years ago

Any news on this? Just thought about how cool it would be to have some metrics about all the drone builds etc.

bradrydzewski commented 8 years ago

My recommendation at this time is to use the API or directly access the database to fetch the metrics that you need. Many dashboard and metrics tools support raw sql queries either natively or through plugins [1][2]

If the community wants an integrated option ...

It should also be pretty straightforward to create a small service that polls drone and either exposes or sends metrics to {insert name of metrics system here} similar to what I suggested in https://github.com/drone/drone/issues/478#issuecomment-109486622. These sort of feature can be easily piloted external to drone, and if they become popular enough, always have the potential to be included in the main codebase.

I therefore suggest creating a simple prototype system that uses the API or hits the database to provide metrics to third party tools such as prometheus or gokit or whatever. This allows the community to iterate without impacting the stability of drone core.

From my perspective, if there is a lack of motivation to create an external utility, it signals perhaps this feature just not important enough to include in drone core in the first place. So just to avoid any ambiguity, the only path to get something like this included in drone core is to create an external proof of concept and gather enough community feedback to ensure it meets broad usage and tooling requirements.

I suggest interested parties message me directly on gitter if they are interested in implementing a proof of concept.

[1] https://github.com/Shopify/dashing/wiki/How-to:-send-mysql-data-to-your-widgets [2] https://github.com/chop-dbhi/prometheus-sql

bradrydzewski commented 7 years ago

closing now that we have http://docs.drone.io/configure-prometheus/