uber-go / zap

Blazing fast, structured, leveled logging in Go.
https://pkg.go.dev/go.uber.org/zap
MIT License
22.12k stars 1.44k forks source link

.With() function repeating keys/values #622

Open henriquechehad opened 6 years ago

henriquechehad commented 6 years ago

Using .With() function it's repeating the same key/value for every With call. Should it overwrite the previous values or have a function to remove specific keys/values previously set?


import "go.uber.org/zap"

func main() {
    logger, _ := zap.NewProduction()
    defer logger.Sync()
    sugar := logger.Sugar()

    sugar = sugar.With("test", "value")
    sugar.Info("message 01")

    // prints:
    // {"level":"info","ts":1534362222.643982,"caller":"tmp/main.go:11","msg":"message 01","test":"value"}

    sugar = sugar.With("test", "value")
    sugar.Info("message 02")

    // prints duplicated "test" "value"
    // {"level":"info","ts":1534362222.644039,"caller":"tmp/main.go:14","msg":"message 02","test":"value","test":"value"}
}
akshayjshah commented 6 years ago

This is as designed, and is documented in NewJSONEncoder. It's technically allowable by the JSON specification, and all deserialization code that I'm aware of (including Go's standard library) preserves only the last value.

Supporting a last-writer-wins policy as you suggest is remarkably difficult in zap (and similar projects) - zap is fast because it encodes fields as they're added, without maintaining some intermediate representation (commonly map[string]interface{}). It's technically possible, and could even be made reasonably fast in the case when there are no duplicates, but we haven't run into many situations where it's valuable - usually we run into this when two developers are mistakenly stomping on each others' log data, so keeping both values in the output makes debugging easier.

In short, this is functioning as designed. If you're interested in a fairly complex PR, I can guide you through how we might implement this feature and benchmark the performance impact.

geoah commented 3 years ago

Just a note:

This seems to be an issue when sending said logs to GCP's stackdriver. It takes the duplicated fields and concats them.

{
    "trace": "foo",
    "trace": "foo"
}

will be presented in stackdriver as

{
    "trace": "foofoo"
}
akshayjshah commented 3 years ago

Huh! I'm a little surprised by this, but perhaps Google's also trying to avoid dropping the duplicate data. I'm no longer at Uber, so the current maintainers have the final say on whether to make any changes to the current duplicate-handling code.

If this is particularly inconvenient for you, you can wrap zap's JSON encoder and fix it yourself. To keep performance reasonably good, you could use a Bloom filter:

That would keep serialization reasonably fast and zero-allocation when there are no duplicates. I haven't thought through how you'd handle duplicates in nested objects (created by zap.Namespace).

pohly commented 1 week ago

This is as designed, and is documented in NewJSONEncoder. It's technically allowable by the JSON specification,

According to the RFC:

The names within an object SHOULD be unique. ... When the names within an object are not unique, the behavior of software that receives such an object is unpredictable.

zap is relying on readers implementing a "last one wins" approach, but that is not required by the spec.

You are right that SHOULD is not MUST, so it's not wrong to write JSON with duplicates - it's just not guaranteed to be interoperable.