grpc-ecosystem / go-grpc-prometheus

Prometheus monitoring for your gRPC Go servers.
Apache License 2.0
1.34k stars 164 forks source link

Add Summary metrics #57

Open markeijsermans opened 6 years ago

markeijsermans commented 6 years ago

I've POC'd adding summary metrics in a similar way as histograms. It's enabled via a similar method EnableHandlingTimeSummary(). Is there interest in me submitting a PR?

Motivation: Our monitoring situation is in transition as we move to prometheus/grafana. We currently export metrics to a hosted solution via a prometheus remote adapter. In that hosted solution calculating a p99 from histogram buckets is not possible (normally you would use histogram_quantile())

brancz commented 6 years ago

Thanks for opening this issue to start the dicussion! :slightly_smiling_face:

My feeling is that we would probably do more harm if we add it, as most people don't understand that summaries are meant for cases where you can't control latencies at all, and would just enable it without understanding what they're doing.

I understand your case and I think what you're doing makes sense in your situation, and you understand that the histogram is actually the appropriate use, but I think I'd prefer people do it as explicitly as you and really understand the implications that has.

I'd still like to hear @Bplotka's thoughts as well.

bwplotka commented 6 years ago

I have mixed feelings, I agree with both @brancz arguments and @markeijsermans motivation. What about doing.. both?

Instead, of adding explicit Method for EnableHandlingTimeSummary because users may shoot themselves in the foot if they are not careful, maybe we should just accept custom serverReporters and clientReporters so advanced users like @markeijsermans can write down summary metric if needed? That would allow for more flexibility maybe.

On the other hand, Prometheus has summaries for some reason, so maybe we should actually trust people to understand the differences between this and histograms? Or do you @brancz think that users will use it extensively and blame Prometheus for not being efficient?

What do you think about adding just extension like custom reporters? That would unblock you @markeijsermans?

brancz commented 6 years ago

I think extensibility would be a good compromise.

markeijsermans commented 6 years ago

My understanding of the hesitance to directly support summary metrics is that it is less flexible and performant. The quantiles are fixed and need to be computed for each observation. I agree with your reasoning. An extension point works for us.

@Bplotka if I understand you correctly, it would look something like this

bwplotka commented 6 years ago

Yup, something like this.

pt., 27 lip 2018, 20:24 użytkownik Mark Eijsermans notifications@github.com napisał:

My understanding of the hesitance to directly support summary metrics is that it is less flexible and performant. The quantiles are fixed and need to be computed for each observation. I understand and agree with your reasoning. An extension point works for us.

@Bplotka https://github.com/Bplotka if I understand you correctly, it would look something like this

  • create ServerReporter interface

type ServerReporter interface { ReceivedMessage() SentMessage() Handled(code codes.Code) }

  • add an extension function WithCustomServerReporter(s ServerReporter)
  • update interceptors to handle the customServerReporter in the same fashion as the default
  • same for the client side

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/grpc-ecosystem/go-grpc-prometheus/issues/57#issuecomment-408501004, or mute the thread https://github.com/notifications/unsubscribe-auth/AGoNu5WuYOFDzvxPO9TEdY0GhnlgGA7Dks5uK1rrgaJpZM4VibNN .