sampling: Support message-level sampling

abhinav commented 10 months ago

zerolog's Sampler interface is currently limited to sampling on the log level. This makes it difficult to build a smarter sampler that suppresses repetitive log messages while still surfacing unique messages. For comparison, Zap uses (level, msg) as the sampling key^1, which allows it to do the above.

This change proposes a new interface, MessageSampler:

type MessageSampler {
    SampleMessage(Level, string) bool
}

There are two ways for us to incorporate this interface: optional upgrade or independent concept.

For convenience, I've included both versions as separate commits.

Optional upgrade

If the sampler passed to Logger.Sample implements this interface, the SampleMessage method will be used by *Event when it finally has access to the message content.

This turns the sampler interface into a two-stage thing. We log only if Sample(level) && SampleMessage(level, msg) == true. There's risk of confusion where a SampleMessage implementation calls Sample again--double counting that message.

Independent concept

We treat MessageSampler as an independent concept. You can register a Sampler, a MessageSampler, neither, or both.

We still log only if Sample(level) && SampleMessage(level, msg) == true but this time there's lower risk that a SampleMessage implementation will call Sample because there's no expectation to satisfy both interfaces.

In exchange, this needs a new API:

func (Logger) SampleMessages(MessageSamler) Logger

As I said, this PR contains both versions as separate commits. I think Independent concept might make more sense, but I'm interested in hearing which approach the maintainers prefer.

Resolves #543

abhinav commented 9 months ago

Gentle bump, @rs. Are y'all willing to accept something in this direction?

rs commented 9 months ago

Suppressing duplicates messages is not the same as sampling, it is a different concept. I wonder if we could add a a method to tell an event not to be sent from a Hook in order to handle such case instead.

Something like:

type RepeatedSuppressorHook struct{
      lastMsg string
}

func (h *RepeatedSuppressorHook) Run(e *zerolog.Event, level zerolog.Level, msg string) {
    if msg == h.lastMsg {
        e.Drop()
    }
    h.lastMsg = msg
}

abhinav commented 9 months ago

Suppressing duplicates messages is not the same as sampling, it is a different concept.

The idea was not to just suppress all duplicate messages, but to still let-through a subset of the duplicates—sampled using the (level, message) tuple to create buckets. Although, I agree, practically, this would suppress a number of duplicate messages.

I wonder if we could add a a method to tell an event not to be sent from a Hook in order to handle such case instead.

That's a valid approach too! Would the existing (*Event).Discard method work here?

rs commented 9 months ago

Oh I forgot about the Discard method, so yes it is already possible with no change.

abhinav commented 9 months ago

Ack. We can build some sampling logic around that. Closing the PR—unless you're interested in message-granularity sampling. Thanks for your response!

rs / zerolog

sampling: Support message-level sampling #618

Optional upgrade

Independent concept