Crate Addition Request: logging & metrics

michael2012z commented 4 years ago

Crate Name

vmm-logger

Short Description

A crate that provides logging and metrics functionality. Refer to Firecracker's good logger crate, with some extensions:

Logging

5 log levels: Error, Warning, Info, Debug, Trace.
Output options: File, FIFO, Network (UDP).
Tag: A filter option in case too much log are printed in some log level (quite probable for Trace and Debug level).
- A tag could be: Memory, CPU, Device, IO, ... A VMM developer can use it to filter logs of a certain topic.
- "Level + Tag" makes a matrix to customize log content. A developer can get precise log by combining the options.

Metrics

Provide measurement data for various aspects of a VMM.
Data is in JSON format.
Output options: File, FIFO, Network (UDP).
Output interval can be customized, 60 seconds by default.

Why is this crate relevant to the rust-vmm project?

A VMM needs a logging tool to monitor what happened and debug; also needs a metrics tool to observe performance and health.

jiangliu commented 4 years ago

This is really needed, how about slog?

andreeaflorescu commented 4 years ago

The configuration of the logger seems more of a VMM related activity and I don't see how this could be unified to serve multiple VMMs. I think we should stick with using the log crate where logs are needed. This way, each VMM can build it's own logger on top of that crate. Loggers are not necessarily vmm specific.

As for the metrics, we are talking about a re-design of the metric system we currently have in Firecracker to support metrics defined per crate, by the crate and that can be easily enable/disabled. Another thing that I would like to see regarding metrics is to have them decentralized (i.e. not use lazy static).

In any case, metrics and logs should not be defined in the same crate.

jiangliu commented 4 years ago

Yes, we are trying to decouple metrics and logs too. The Metrics should be per-component/per-subsystem instead of a globally defined instance:)

michael2012z commented 4 years ago

Regarding metrics. Defining metrics per crate sounds a good design. So what we need is a basic metrics crate/framework to let each using-crate define their own metrics conveniently, right?

I am quite interested in how to enable/disable metrics from different crates in run time. Was it discussed in a public issue? Can you share a link to learn?

andreeaflorescu commented 4 years ago

I am quite interested in how to enable/disable metrics from different crates in run time. Was it discussed in a public issue? Can you share a link to learn?

We just discussed this internally in the team, we don't have anything written yet. I might be able to write a short doc with my ideas some time this week. Would that be okay with you?

michael2012z commented 4 years ago

That's great! Thank you very much.

andreeaflorescu commented 4 years ago

Hey everyone, I've written a design doc for a decentralized metric system. You can find the design in the README.md file and a dummy example in the repository.

I don't see the metrics being something so big that it needs its own crate, so one proposal would be to add it to vmm-sys-util instead. WDYT?

michael2012z commented 4 years ago

Hi, @andreeaflorescu , Thank you very much for sharing the design. The idea is clearly described in the document and example code.

I have a concern regarding the flushing of metrics data. In this decentralized system, each component involving metrics has to flush its own data. Like the example code: https://github.com/andreeaflorescu/metrics-proposal/blob/master/vmm/src/lib.rs#L35 , the VMM needs to go through all the components with metrics. If there are a lot of such components, the VMM needs to take care of them somehow (for example, maintain a list).

andreeaflorescu commented 4 years ago

the VMM needs to go through all the components with metrics. If there are a lot of such components, the VMM needs to take care of them somehow (for example, maintain a list).

The VMM needs to only flush the metrics corresponding to its direct dependencies. This is exemplified in the design doc here. The photo was previously missing as I forgot to add it to the commit. I am expecting each component to have a manageable list of sub-components, but if that's not the case, it should be the responsibility of the VMM to create a manageable wrapper over its own components. Does it make sense?

michael2012z commented 4 years ago

I am so sorry, I missed that part. Yes, you have considered about it.

The hierarchical design looks good.

jiangliu commented 4 years ago

Do we want to build a new logger system from ground or make use of some existing log framework, such as slog? I feel slog is flexible enough. For metrics, is it ok to support Prometheus?

andreeaflorescu commented 4 years ago

Do we want to build a new logger system from ground or make use of some existing log framework, such as slog? I feel slog is flexible enough.

I would prefer to use log and then allow the VMMs to select their own backend. Do you have something else in mind? I took a glance at slog and it doesn't look like it's built on top of log, hence we would need to

For metrics, is it ok to support Prometheus?

Can you expand on what is needed to support Prometheus? Is it just a metrics aggregator?

jiangliu commented 4 years ago

Do we want to build a new logger system from ground or make use of some existing log framework, such as slog? I feel slog is flexible enough.

I would prefer to use log and then allow the VMMs to select their own backend. Do you have something else in mind? I took a glance at slog and it doesn't look like it's built on top of log, hence we would need to

Yes, slog and log have different design, though slog has adaptors to work with log crate. The most valuable features of slog is that we could attach key/value pairs to each log messages, which helps log analyzers. And it's true that we should focus on logger frontend for rust-vmm crates and let VMMs to choose the backend.

For metrics, is it ok to support Prometheus?

Can you expand on what is needed to support Prometheus? Is it just a metrics aggregator?

In addition to Prometheus servers, it also provides several client side library for applications to add metrics counters. I think we could use prometheus client library(such as https://crates.io/crates/prometheus) to implement metrics counters. And it would be preferred to implement metrics counters by each component instead of a centralized metrics counter set.

andreeaflorescu commented 4 years ago

@jiangliu I've created a PR with the metric interface that I was talking about before: https://github.com/rust-vmm/vmm-sys-util/pull/94

I looked at Prometheus as well, it is pretty cool! It does add a few dependencies though, so I would not include it in default builds of rust-vmm. I looked at the examples that they have on github, and it looks like the Metric interface that I'm proposing in https://github.com/rust-vmm/vmm-sys-util/pull/94 can be implemented with Prometheus metrics as well.

Would you mind taking a look? The PR is still not ready to merge as I would need to add more documentation, but I would first like to know if you're onboard with the general idea before investing more time into it.

jiangliu commented 3 years ago

@andreeaflorescu thanks for the great proposal. The overall design is ok, and should be easy to adopt.

rust-vmm / community