tokio-rs / tracing

Application level tracing for Rust.
https://tracing.rs
MIT License
5.45k stars 718 forks source link

Translate fields emitted with dot notation into nested JSON objects #663

Open NAlexPear opened 4 years ago

NAlexPear commented 4 years ago

Feature Request

tracing should be able to emit events as nested JSON when fields use dot notation, e.g:

this event:

trace!(foo.id = 123, foo.bar.baz = "luhrmann");

should translate to a fields JSON object of:

{
  "foo": {
    "id": 123,
    "bar": {
      "baz": "luhrmann"
    }
  }
}

Crates

tracing-subscriber (specifically, the Json marker for Format, probably)

Motivation

It's often important to include further structure to an event beyond a simple key-value mapping, particularly when working with JSON. Nested structure is also enforced by many third-party services, including Stackdriver (especially important for including information about an httpRequest in a LogEntry)

Proposal

I'd propose either providing this field-splitting behavior as part of the implementation of Json(probably enabled through a builder method a la flatten_event) or as a separate, specialized Visitor available through the tracing-subscriber crate.

Alternatives

  1. this is currently possible as long as users implement their own custom visitor implementation, but the use-case is common enough that it would be nice not to force users to abandon Json to implement for themselves
  2. even easier than dot-notation (from the user's perspective) would be implementing tracing::Value for serde_json::Map or any other type serializable to JSON through serde, but that requires unsealing tracing::Value. And even once tracing::Value is unsealed, dot-notated field expansion would still be nice to have for simple cases in addition to custom Value impls for more complex ones.
hawkw commented 4 years ago

This makes sense to me, and I'd love to see an implementation. I agree that unsealing tracing::Value is in many ways an ideal solution, but I think that will take more work than implementing this behavior. Therefore, we should probably go ahead and add this, since it will probably be done much sooner than unsealing Value.

Some notes:

I'd propose either providing this field-splitting behavior as part of the implementation of Json(probably enabled through a builder method a la flatten_event) or as a separate, specialized Visitor available through the tracing-subscriber crate.

I think that ideally, we would implement this in the tracing-serde crate, as an extension or wrapper for the SerdeMapVisitor type and/or the SerializeFieldMap type. Then, tracing-subscriber's Json could have an option in the builder to use that.

As for how this would work...I'm not totally sure yet. I think that, because serde's JSON serializer needs an &mut borrow, and Serializer::serialize_map moves the serializer by value, I'm not sure if we can traverse the value set once with multiple serializers. Instead, I think we may need to do it multiple times (perhaps by recursing every time we encounter a new dotted prefix?). There is some performance penalty for doing this, but I think it's unavoidable.

bcmyers commented 4 years ago

I was interested in structured json logs as well; so I coded up a Subscriber / Layer that takes this...

use serde::{Serialize, Deserialize};
use serde_json::json;

#[derive(Debug, Serialize, Deserialize)]
struct MyType {
    key: String,
}

let my_thing = MyType {
    key: "value".into(),
};

tracing::error!(
    foo = %serde_json::to_string(&my_thing).unwrap(),
    bar = %43.3,
    baz = %json!({"hello": "world"}),
    "Hello world"
);

and outputs this...

{
  "service": "my-service-name",
  "timestamp": "2020-04-18T16:45:03.976333+00:00",
  "level": "error",
  "module": "logger::tests",
  "file": "logger/src/lib.rs",
  "line": 53,
  "data": {
    "message": "Hello world",
    "foo": {
      "key": "value"
    },
    "bar": 43.3,
    "baz": {
      "hello": "world"
    }
  }
}

The key was this...

struct Visitor(HashMap<&'static str, serde_json::Value>);

impl Default for Visitor {
    fn default() -> Self {
        Visitor(HashMap::new())
    }
}

impl Visit for Visitor {
    fn record_bool(&mut self, field: &Field, value: bool) {
        self.0.insert(field.name(), value.into());
    }

    fn record_debug(&mut self, field: &Field, value: &dyn fmt::Debug) {
        let s = format!("{:?}", value);
        match serde_json::from_str::<serde_json::Value>(&s) {
            Ok(value) => {
                self.0.insert(field.name(), value.into());
            }
            Err(_) => {
                self.0.insert(field.name(), s.into());
            }
        }
    }

    fn record_error(&mut self, field: &Field, value: &(dyn std::error::Error + 'static)) {
        self.0.insert(field.name(), value.to_string().into());
    }

    fn record_i64(&mut self, field: &Field, value: i64) {
        self.0.insert(field.name(), value.into());
    }

    fn record_str(&mut self, field: &Field, value: &str) {
        match serde_json::from_str::<serde_json::Value>(value) {
            Ok(value) => {
                self.0.insert(field.name(), value.into());
            }
            Err(_) => {
                self.0.insert(field.name(), value.to_string().into());
            }
        }
    }

    fn record_u64(&mut self, field: &Field, value: u64) {
        self.0.insert(field.name(), value.into());
    }
}

It's not the most efficient thing in the world, but it's a workaround to your issue, at least. Any pointers on how to do this better would of course be much appreciated.

schell commented 4 years ago

Any movement on this? It would be great to be able to print better structured logs. This would help monitoring rust lambdas on cloudwatch, for example.

Just to brainstorm other solutions:

hawkw commented 4 years ago

@bcmyers the visitor you've written seems like the correct approach for now — we'd be happy to merge a PR adding something similar to the Json formatter. We can figure out how to make it more efficient later, especially if the performance characteristics are documented.

@schell re:

  • add fn record_value(&mut self, field: &Field, value: &Value) to Visitor

Expanding the Value system to support more advanced serialization (such as recording types implementing serde::Serialize, or serializing nested values or collections of values with tracing's Value trait, are definitely goals that we'd like to pursue in the future. The reason this isn't implemented yet is because it requires some design work that, so far, nobody has had the time to work on. However, I don't think the ultimate strategy for this will be special-casing serde_json Values — we'd like a solution that works for more serialization formats than just JSON.

davidbarsky commented 4 years ago

@schell No, not beyond what you saw here in this issue. We would absolutely love a PR to this repository to support this change, but if this is blocking you, that formatter config doesn't need to live in this repository. If you or @bcmyers decided to to implement the visitor in a third-party crate, I'd be happy to review it and provide feedback—that visitor only needs to take a dependency on tracing and tracing-subscriber.

This would help monitoring rust lambdas on cloudwatch, for example.

I agree :D! It'd be useful for anything that is hooked up to CloudWatch Logs/Insights.

add an "unescaped" sigil that tells the info,debug, etc macros not to quote a string, enabling callsites to log a json string.

The fmt::Display implementation for "abc" is the unquoted string abc, which you can get via the % shorthand or tracing::field::display.

schell commented 4 years ago

Good to know there is a solution here, and thanks for the quick discussion!

The fmt::Display implementation for "abc" is the unquoted string abc, which you can get via the % shorthand or tracing::field::display.

We still need to remove a level of quoting, as this call:

    let key = json!({"outer": {"inner":true}});
    debug!(msg = "run", %key);

will produce this output message when using a JSON subscriber:

{
    "timestamp": "May 01 18:42:34.645",
    "level": "DEBUG",
    "target": "my_lambda",
    "msg": "run",
    "key": "{\"outer\":{\"inner\":true}}"
}

which is treating key's value as a string and quoting it (I'm guessing via a format!("{}", key)) which is escaping any inner quotation marks. Maybe there's another easy way around this?

@davidbarsky thoughts?

hawkw commented 4 years ago

which is treating key's value as a string and quoting it (I'm guessing via a format!("{}", key)) which is escaping any inner quotation marks. Maybe there's another easy way around this?

The decision to treat the value as a string is internal to the Json formatter. We might want to change that behavior for handling Display, but my guess is that it was intended to prevent accidentally creating invalid JSON?

dcormier commented 3 years ago

A more complete example of how to use @bcmyers' impl Visit example with a Subscriber/Layer would be really helpful.

GeeWee commented 2 years ago

I've spent a little time debugging this, and I think it's currently not possible to do in any easy way.

However here's my initial code that sort-of makes it work. It's inspired by this blog post and @bcmyers example above. It's not a full-fledged working example, but it might give someone some pointers on how to move forward.

The issue with the double quotes using the default json formatter is (I think) because of the line here https://github.com/tokio-rs/tracing/blob/937c5d7cf08f5829651e3e9f99227c92e159fa82/tracing-subscriber/src/fmt/format/json.rs#L471-L474 Where the JSON ends up being serialized twice.

So either that line needs to change, or we need to provide a totatlly custom subscriber. Something like the below

use std::collections::HashMap;
use std::fmt;
use tracing::field::Field;

// This is a custom layer that records stuff as JSON
pub struct CustomJsonLayer;

impl<S> tracing_subscriber::Layer<S> for CustomJsonLayer
where
    S: tracing::Subscriber,
{
    fn on_event(
        &self,
        event: &tracing::Event<'_>,
        _ctx: tracing_subscriber::layer::Context<'_, S>,
    ) {
        // from https://burgers.io/custom-logging-in-rust-using-tracing
        // Covert the values into a JSON object
        let mut visitor = JsonVisitor::default();
        event.record(&mut visitor);

        // Output the event in JSON
        let output = serde_json::json!({
            "target": event.metadata().target(),
            "name": event.metadata().name(),
            "level": format!("{:?}", event.metadata().level()),
            "fields": visitor.0,
        });

        // TODO this isn't right. This should be written to the log output somehow. Perhaps in a separate subscriber
        println!("{}", serde_json::to_string_pretty(&output).unwrap());
    }
}

// The visitor implementation that tries to serialize json as a string
struct JsonVisitor(HashMap<&'static str, serde_json::Value>);

impl Default for JsonVisitor {
    fn default() -> Self {
        JsonVisitor(HashMap::new())
    }
}

impl tracing::field::Visit for JsonVisitor {
    fn record_i64(&mut self, field: &Field, value: i64) {
        self.0.insert(field.name(), value.into());
    }

    fn record_u64(&mut self, field: &Field, value: u64) {
        self.0.insert(field.name(), value.into());
    }

    fn record_bool(&mut self, field: &Field, value: bool) {
        self.0.insert(field.name(), value.into());
    }

    fn record_str(&mut self, field: &Field, value: &str) {
        match serde_json::from_str::<serde_json::Value>(value) {
            Ok(value) => {
                self.0.insert(field.name(), value.into());
            }
            Err(_) => {
                self.0.insert(field.name(), value.to_string().into());
            }
        }
    }

    fn record_error(&mut self, field: &Field, value: &(dyn std::error::Error + 'static)) {
        self.0.insert(field.name(), value.to_string().into());
    }

    fn record_debug(&mut self, field: &Field, value: &dyn fmt::Debug) {
        let s = format!("{:?}", value);
        match serde_json::from_str::<serde_json::Value>(&s) {
            Ok(value) => {
                self.0.insert(field.name(), value.into());
            }
            Err(_) => {
                self.0.insert(field.name(), s.into());
            }
        }
    }
}

And an example usage is then something like this:

let collector = tracing_subscriber::fmt()
            // filter spans/events with level TRACE or higher.
            .with_max_level(Level::TRACE)
            // .json() - NOTE NO JSON
            // build but do not install the subscriber.
            .finish();

        let collector = collector.with(CustomJsonLayer);
        let _guard = tracing::subscriber::set_default(collector);

        let data = serde_json::to_string(my_object).unwrap();
        tracing::info!(%data);

That prints out nested json fine, but I'm not sure how to integrate it into the bigger subscriber eco-system. I imagine tracing-serde would have some pointers, but I'm afraid I don't have time to research any further. I hope this was useful to someone.

jakajancar commented 2 years ago

For anyone else running into this thread: https://github.com/tokio-rs/tracing/discussions/1906