open-telemetry / opentelemetry-go

OpenTelemetry Go API and SDK
https://opentelemetry.io/docs/languages/go
Apache License 2.0
5.29k stars 1.08k forks source link

Verify compliant metric SDK specification implementation: MeterProvider/Observations inside asynchronous callbacks #3652

Closed MrAlias closed 1 year ago

MrAlias commented 1 year ago
MrAlias commented 1 year ago

Callback functions MUST be invoked for the specific MetricReader performing collection, such that observations made or produced by executing callbacks only apply to the intended MetricReader during collection.

The SDK does not look compliant with this:

func TestMeterProviderMixingOnRegisterErrors(t *testing.T) {
    otel.SetLogger(testr.New(t))

    rdr0 := NewManualReader()
    mp0 := NewMeterProvider(WithReader(rdr0))

    rdr1 := NewManualReader()
    mp1 := NewMeterProvider(WithReader(rdr1))

    // Meters with the same scope but different MeterProviders.
    m0 := mp0.Meter("TestMeterProviderMixingOnRegisterErrors")
    m0Ctr, err := m1.Float64ObservableCounter("float64 ctr")
    require.NoError(t, err)

    m1 := mp1.Meter("TestMeterProviderMixingOnRegisterErrors")
    m1Ctr, err := m1.Int64ObservableCounter("int64 ctr")
    require.NoError(t, err)

    _, err = m0.RegisterCallback(
        func(_ context.Context, o metric.Observer) error {
            o.ObserveFloat64(m0Ctr, 2)
            // Observe an instrument from a differnt MeterProvider.
            o.ObserveInt64(m1Ctr, 1)

            return nil
        },
        m0Ctr, m1Ctr,
    )
    assert.Error(
        t,
        err,
        "Instrument registered with Meter from different MeterProvider",
    )

    var data metricdata.ResourceMetrics
    _ = rdr0.Collect(context.Background(), &data)
    // Only the metrics from mp0 should be produced.
    assert.Len(t, data.ScopeMetrics, 1)

    err = rdr0.Collect(context.Background(), &data)
    assert.NoError(t, err, "Errored when collect should be a noop")
    assert.Len(
        t, data.ScopeMetrics, 0,
        "Metrics produced for instrument collected by different MeterProvider",
    )
}
go test ./...
?       go.opentelemetry.io/otel/sdk/metric/metricdata  [no test files]
--- FAIL: TestMeterProviderMixingOnRegisterErrors (0.00s)
    provider_test.go:111:
            Error Trace:    /home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/provider_test.go:111
            Error:          An error is expected but got nil.
            Test:           TestMeterProviderMixingOnRegisterErrors
            Messages:       Instrument registered with Meter from different MeterProvider
    provider_test.go:124:
            Error Trace:    /home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/provider_test.go:124
            Error:          "[{{TestMeterProviderMixingOnRegisterErrors  } [{float64 ctr   {[{{{[]}} 2023-06-01 13:29:57.873249248 -0700 PDT m=+0.021975681 2023-06-01 13:29:57.873306387 -0700 PDT m=+0.022032821 %!s(float64=2) []}] CumulativeTemporality %!s(bool=true)}}]}]" should have 0 item(s), but has 1
            Test:           TestMeterProviderMixingOnRegisterErrors
            Messages:       Metrics produced for instrument collected by different MeterProvider
FAIL
FAIL    go.opentelemetry.io/otel/sdk/metric 0.026s
ok      go.opentelemetry.io/otel/sdk/metric/aggregation (cached)
ok      go.opentelemetry.io/otel/sdk/metric/internal    (cached)
ok      go.opentelemetry.io/otel/sdk/metric/metricdata/metricdatatest   (cached)
FAIL
MrAlias commented 1 year ago

The SDK does not look compliant with this

Tracking with https://github.com/open-telemetry/opentelemetry-go/issues/4164

MrAlias commented 1 year ago

The implementation SHOULD disregard the accidental use of APIs appurtenant to asynchronous instruments outside of registered callbacks in the context of a single MetricReader collection.

This sounds like a complex way of saying calls to observable instruments outside of callbacks need to be ignored. Given the observables here do not have any methods, we comply with this implicitly.

MrAlias commented 1 year ago

The implementation SHOULD use a timeout to prevent indefinite callback execution.

The implementation does not explicitly use a timeout for the callback execution. However it passes the context passed to any collect call that may include a timeout.

I do not think the appropriate, or idiomatic, behavior here is to run callbacks in a goroutine and abandon them if the timeout fails. Instead, the readers should be documented that the callback they pass to Collect will honor any timeouts and the callbacks need to be documented that they need to honor timeouts in the passed context.

For the periodic reader, there is a timeout used for an export:

https://github.com/open-telemetry/opentelemetry-go/blob/b9079960aed5cf98a8c80e832cfa9e065bcd3fd4/sdk/metric/periodic_reader.go#L279

It probably makes sense to include this timeout in the collection process as well.

MrAlias commented 1 year ago

The implementation SHOULD use a timeout to prevent indefinite callback execution.

The implementation does not explicitly use a timeout for the callback execution. However it passes the context passed to any collect call that may include a timeout.

I do not think the appropriate, or idiomatic, behavior here is to run callbacks in a goroutine and abandon them if the timeout fails. Instead, the readers should be documented that the callback they pass to Collect will honor any timeouts and the callbacks need to be documented that they need to honor timeouts in the passed context.

For the periodic reader, there is a timeout used for an export:

https://github.com/open-telemetry/opentelemetry-go/blob/b9079960aed5cf98a8c80e832cfa9e065bcd3fd4/sdk/metric/periodic_reader.go#L279

It probably makes sense to include this timeout in the collection process as well.

https://github.com/open-telemetry/opentelemetry-go/issues/4166

MrAlias commented 1 year ago

The implementation MUST complete the execution of all callbacks for a given instrument before starting a subsequent round of collection.

The collection process is guarded by a lock that is unique to a pipeline (reader/views/exporter):

https://github.com/open-telemetry/opentelemetry-go/blob/b9079960aed5cf98a8c80e832cfa9e065bcd3fd4/sdk/metric/pipeline.go#L126-L127

That ensures that all the callbacks will be completed before a "subsequent round of collection" for the pipeline is started.

MrAlias commented 1 year ago

Outstanding:

MrAlias commented 1 year ago

Done.