line / armeria

Your go-to microservice framework for any situation, from the creator of Netty et al. You can build any type of microservice leveraging your favorite technologies, including gRPC, Thrift, Kotlin, Retrofit, Reactive Streams, Spring Boot and Dropwizard.
https://armeria.dev
Apache License 2.0
4.73k stars 899 forks source link

Add timing information for acquiring an existing connection #4886

Open jrhee17 opened 1 year ago

jrhee17 commented 1 year ago

Currently, client-side provides some timing information when a new connection is created. https://armeria.dev/docs/advanced-structured-logging#client-connection-timing-properties

This information is exposed with the naming connectionAcquisitionDurationNanos. However, these metrics are recorded only for cases where a new connection is created.

It has been pointed out that the naming is confusing since this metric isn't recorded when an existing connection is used. We may do one of the following:

  1. Add a new timing to ClientConnectionTimings

  2. Reuse pendingAcquisitionDurationNanos

    • We don't have to worry about naming, but the logic to whether start/end the metric may be complex.
  3. Don't define a separate timing and just end the timings

I propose we go with option 1 might be simpest/least prone to error. I think we also want to expose this metric to RequestMetricSupport by default.

Note: As pointed out by @ikhoon, doing this would also fix a bug where dns timing information wasn't collected if there is a healthy connection already available.

ref: https://line-armeria.slack.com/archives/C1NGPBUH2/p1684469502484689

Lincong commented 1 year ago

Thanks @jrhee17 for creating this issue!

I agree that option 1 is the most reasonable option. Here are some of my suggestions on option (and please FFTI):

jrhee17 commented 1 year ago

I'd also suggest renaming connectionAcquisitionDurationNanos to newConnectionAcquisitionDurationNanos. This way, the name is the most clear.

Rather, I was thinking that existingConnectionAcquisitionDurationNanos would be included in connectionAcquisitionDurationNanos.

  1. If a readily existing connection exists, existingConnectionAcquisitionDurationNanos > 0
  2. If a connection is in the progress of creation, pendingAcquisitionDurationNanos > 0
  3. If a connection is created, socketConnectDurationNanos > 0

Essentially, connectionAcquisitionDurationNanos = existingConnectionAcquisitionDurationNanos + pendingAcquisitionDurationNanos + socketConnectDurationNanos so semantically I was thinking the naming is fine as-is.

jrhee17 commented 1 year ago

Note: @ikhoon also suggested that we change the naming of the metric collected in RequestMetricSupport: https://github.com/line/armeria/blob/f2116739018c15f230dd8f0e5a4dd5db1fefc888/core/src/main/java/com/linecorp/armeria/internal/common/metric/RequestMetricSupport.java#L289-L296

He suggested that we use tags to differentiate connection acquisition e.g. connection.acquisition.duration

To avoid breaking changes, we may just add another metric at the cost of extra metric exported. (and deprecate the old metric. Deprecation may be propagated via javadocs and documentation to be added)

We should also update documentation since the relationship between values in ConnectionTimings and the metric names may not be clear.

ikhoon commented 1 year ago

If the default metrics are more accurately expressed/exposed by RequestMetricsSupport, I think it will be possible to reduce the hassle of creating custom metrics by accessing ClientConnectionTimings.

The new metric will provide more accurate information about connection acquisition.