open-telemetry / opentelemetry-go

OpenTelemetry Go API and SDK
https://opentelemetry.io/
Apache License 2.0
4.99k stars 1.01k forks source link

Move time.Now call into exemplar reservoir to improve performance #5545

Closed dashpole closed 1 day ago

dashpole commented 1 week ago

Part of addressing https://github.com/open-telemetry/opentelemetry-go/issues/5542.

Motivation

This removes the time.Now() call from filtered-out Exemplars by only invoking time.Now() after the filtering decision is made. This improvement is especially noticeable for measurements without any attributes.

goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric
cpu: AMD EPYC 7B12
                                                                 │   old.txt    │                new.txt                │
                                                                 │    sec/op    │   sec/op     vs base                  │
SyncMeasure/NoView/Int64Counter/Attributes/0-24                    158.20n ± 4%   99.83n ± 1%  -36.90% (p=0.000 n=10)
SyncMeasure/NoView/Int64Counter/Attributes/1-24                     333.3n ± 4%   274.8n ± 1%  -17.55% (p=0.000 n=10)
SyncMeasure/NoView/Int64Counter/Attributes/10-24                    1.640µ ± 1%   1.600µ ± 1%   -2.41% (p=0.000 n=10)
SyncMeasure/NoView/Float64Counter/Attributes/0-24                   159.0n ± 3%   101.3n ± 0%  -36.27% (p=0.000 n=10)
SyncMeasure/NoView/Float64Counter/Attributes/1-24                   340.0n ± 2%   272.0n ± 1%  -20.00% (p=0.000 n=10)
SyncMeasure/NoView/Float64Counter/Attributes/10-24                  1.661µ ± 1%   1.597µ ± 0%   -3.85% (p=0.000 n=10)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/0-24               159.8n ± 1%   103.1n ± 0%  -35.50% (p=0.000 n=10)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/1-24               339.5n ± 1%   273.1n ± 0%  -19.57% (p=0.000 n=10)
SyncMeasure/NoView/Int64UpDownCounter/Attributes/10-24              1.656µ ± 0%   1.589µ ± 0%   -4.05% (p=0.000 n=10)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/0-24             159.3n ± 2%   100.8n ± 0%  -36.74% (p=0.000 n=10)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/1-24             337.9n ± 2%   271.8n ± 1%  -19.55% (p=0.000 n=10)
SyncMeasure/NoView/Float64UpDownCounter/Attributes/10-24            1.657µ ± 0%   1.593µ ± 1%   -3.83% (p=0.000 n=10)
SyncMeasure/NoView/Int64Histogram/Attributes/0-24                  144.65n ± 4%   89.38n ± 0%  -38.21% (p=0.000 n=10)
SyncMeasure/NoView/Int64Histogram/Attributes/1-24                   235.7n ± 2%   183.5n ± 0%  -22.15% (p=0.000 n=10)
SyncMeasure/NoView/Int64Histogram/Attributes/10-24                  900.8n ± 1%   836.8n ± 0%   -7.10% (p=0.000 n=10)
SyncMeasure/NoView/Float64Histogram/Attributes/0-24                145.60n ± 5%   93.48n ± 1%  -35.80% (p=0.000 n=10)
SyncMeasure/NoView/Float64Histogram/Attributes/1-24                 240.9n ± 1%   183.0n ± 0%  -24.06% (p=0.000 n=10)
SyncMeasure/NoView/Float64Histogram/Attributes/10-24                905.6n ± 1%   826.3n ± 0%   -8.76% (p=0.000 n=10)
SyncMeasure/DropView/Int64Counter/Attributes/0-24                   20.33n ± 0%   20.35n ± 0%        ~ (p=0.302 n=10)
SyncMeasure/DropView/Int64Counter/Attributes/1-24                   26.46n ± 0%   26.45n ± 1%        ~ (p=0.868 n=10)
SyncMeasure/DropView/Int64Counter/Attributes/10-24                  26.50n ± 0%   26.47n ± 0%        ~ (p=0.208 n=10)
SyncMeasure/DropView/Float64Counter/Attributes/0-24                 20.34n ± 1%   20.27n ± 0%   -0.34% (p=0.009 n=10)
SyncMeasure/DropView/Float64Counter/Attributes/1-24                 26.55n ± 0%   26.60n ± 1%        ~ (p=0.109 n=10)
SyncMeasure/DropView/Float64Counter/Attributes/10-24                26.59n ± 1%   26.57n ± 1%        ~ (p=0.926 n=10)
SyncMeasure/DropView/Int64UpDownCounter/Attributes/0-24             20.38n ± 1%   20.38n ± 0%        ~ (p=0.725 n=10)
SyncMeasure/DropView/Int64UpDownCounter/Attributes/1-24             26.39n ± 0%   26.44n ± 0%        ~ (p=0.238 n=10)
SyncMeasure/DropView/Int64UpDownCounter/Attributes/10-24            26.52n ± 0%   26.42n ± 0%   -0.36% (p=0.049 n=10)
SyncMeasure/DropView/Float64UpDownCounter/Attributes/0-24           20.30n ± 0%   20.25n ± 0%        ~ (p=0.196 n=10)
SyncMeasure/DropView/Float64UpDownCounter/Attributes/1-24           26.57n ± 0%   26.54n ± 1%        ~ (p=0.540 n=10)
SyncMeasure/DropView/Float64UpDownCounter/Attributes/10-24          26.57n ± 0%   26.51n ± 1%        ~ (p=0.643 n=10)
SyncMeasure/DropView/Int64Histogram/Attributes/0-24                 20.37n ± 0%   20.36n ± 1%        ~ (p=1.000 n=10)
SyncMeasure/DropView/Int64Histogram/Attributes/1-24                 26.41n ± 0%   26.50n ± 0%   +0.32% (p=0.007 n=10)
SyncMeasure/DropView/Int64Histogram/Attributes/10-24                26.44n ± 0%   26.55n ± 1%   +0.42% (p=0.012 n=10)
SyncMeasure/DropView/Float64Histogram/Attributes/0-24               20.30n ± 0%   20.45n ± 0%   +0.74% (p=0.000 n=10)
SyncMeasure/DropView/Float64Histogram/Attributes/1-24               26.52n ± 0%   26.48n ± 0%        ~ (p=0.127 n=10)
SyncMeasure/DropView/Float64Histogram/Attributes/10-24              26.55n ± 0%   26.48n ± 0%   -0.26% (p=0.002 n=10)
SyncMeasure/AttrFilterView/Int64Counter/Attributes/0-24             170.5n ± 2%   110.8n ± 0%  -35.03% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64Counter/Attributes/1-24             402.5n ± 1%   331.5n ± 1%  -17.64% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64Counter/Attributes/10-24            1.363µ ± 1%   1.281µ ± 1%   -6.02% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64Counter/Attributes/0-24           170.6n ± 1%   111.5n ± 1%  -34.64% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64Counter/Attributes/1-24           397.1n ± 1%   335.9n ± 0%  -15.41% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64Counter/Attributes/10-24          1.371µ ± 1%   1.279µ ± 1%   -6.71% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64UpDownCounter/Attributes/0-24       170.1n ± 1%   112.2n ± 0%  -34.09% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64UpDownCounter/Attributes/1-24       397.5n ± 1%   330.2n ± 0%  -16.93% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64UpDownCounter/Attributes/10-24      1.371µ ± 1%   1.289µ ± 1%   -5.95% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64UpDownCounter/Attributes/0-24     171.4n ± 2%   112.9n ± 0%  -34.13% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64UpDownCounter/Attributes/1-24     397.0n ± 3%   336.4n ± 0%  -15.24% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Float64UpDownCounter/Attributes/10-24    1.383µ ± 1%   1.305µ ± 1%   -5.61% (p=0.000 n=10)
SyncMeasure/AttrFilterView/Int64Histogram/Attributes/0-24          157.30n ± 2%   98.58n ± 1%  -37.33% (p=0.000 n=6+10)

Changes

codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 84.4%. Comparing base (1cefb17) to head (93ec4fa).

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/open-telemetry/opentelemetry-go/pull/5545/graphs/tree.svg?width=650&height=150&src=pr&token=8efTmh4kvf&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry)](https://app.codecov.io/gh/open-telemetry/opentelemetry-go/pull/5545?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry) ```diff @@ Coverage Diff @@ ## main #5545 +/- ## ======================================= - Coverage 84.4% 84.4% -0.1% ======================================= Files 270 271 +1 Lines 22336 22333 -3 ======================================= - Hits 18871 18868 -3 Misses 3134 3134 Partials 331 331 ``` [see 8 files with indirect coverage changes](https://app.codecov.io/gh/open-telemetry/opentelemetry-go/pull/5545/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry)
MrAlias commented 1 week ago

The timestamp arg is something recommended by the specification: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#exemplarreservoir

This may be necessary now, but when we accept a user defined reservoir this seems like an important parameter.

Would passing a zero value time from the SDK and check for this zero value in the reservoir to then call now be a better approach?

dashpole commented 1 week ago

I changed it to check for time.IsZero()

dashpole commented 1 week ago

Java doesn't appear to accept a timestamp (although it looks like it is internal): https://github.com/open-telemetry/opentelemetry-java/blob/0aacc55d1e3f5cc6dbb4f8fa26bcb657b01a7bc9/sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/exemplar/ExemplarReservoir.java#L95, but JS does: https://github.com/open-telemetry/opentelemetry-js/blob/f99e7d9c2c37376e752f746308affa74fc7c30d2/packages/sdk-metrics/src/exemplar/ExemplarReservoir.ts#L33.

dashpole commented 1 week ago

@MrAlias I was re-reading the specification, and I think the intent is that Offer is only called when the exemplar is definitely going to be recorded (i.e. after the filter). We should probably have one Offer to the (non-public) FilteredReservoir which does not have a timestamp argument, and one Offer to the (to-be-public) Reservoir which does have the timestamp argument.

MrAlias commented 1 week ago

We should probably have one Offer to the (non-public) FilteredReservoir which does not have a timestamp argument, and one Offer to the (to-be-public) Reservoir which does have the timestamp argument.

I'm not sure I fully understand this implementation, but in general this sounds correct to me.

Does that mean we would keep the existing Offer method and just change the FilteredReservoir in this PR?

dashpole commented 1 week ago

Does that mean we would keep the existing Offer method and just change the FilteredReservoir in this PR?

Yes.

dashpole commented 5 days ago

Please let me know if this is too large to review or too large for a last-minute change before a release. I can try to make a version which just fixes the issue with as small of a diff as possible.