googleapis / google-cloud-go

Google Cloud Client Libraries for Go.
https://cloud.google.com/go/docs/reference
Apache License 2.0
3.77k stars 1.29k forks source link

spanner: panic in recordGFELatencyMetricsOT if EnableOpenTelemetryMetrics called after NewClient #9519

Open henripqt opened 8 months ago

henripqt commented 8 months ago

Client

Spanner

Environment

e.g. Alpine Docker on GKE

Go Environment

go1.22.0

Code

When enabling open telemetry after creating the client the configuration set on the client is invalid resulting in potential panic.

e.g.

package main

func main() {
   // ... 
    client, err := gcspanner.NewClient(
        ctx,
        db,
        opts...,
    )

    if err != nil {
        panic(err)
    }

    initOnce.Do(func() {
        if os.Getenv("SPANNER_EMULATOR_HOST") == "" {
            spanner.EnableOpenTelemetryMetrics()
        }
    })

    // ...
}

In the above example the new client created would have a default openTelemetryConfig with all instruments being nil possibly leading to a panic during a call to recordGFELatencyMetricsOT (see bellow with added comments)


func recordGFELatencyMetricsOT(ctx context.Context, md metadata.MD, keyMethod string, otConfig *openTelemetryConfig) error {
    if !IsOpenTelemetryMetricsEnabled() || md == nil && otConfig == nil { // the global flag is enable thus we don't return nil
        return nil
    }
    attr := otConfig.attributeMap
    if len(md.Get("server-timing")) == 0 && otConfig.gfeHeaderMissingCount != nil { // at this point otConfig.gfeHeaderMissingCount is nil
        otConfig.gfeHeaderMissingCount.Add(ctx, 1, metric.WithAttributes(attr...))
        return nil
    }
    serverTiming := md.Get("server-timing")[0] // This will panic !
    gfeLatency, err := strconv.Atoi(strings.TrimPrefix(serverTiming, "gfet4t7; dur="))
    if !strings.HasPrefix(serverTiming, "gfet4t7; dur=") || err != nil {
        return err
    }
    attr = append(attr, attributeKeyMethod.String(keyMethod))
    if otConfig.gfeLatency != nil {
        otConfig.gfeLatency.Record(ctx, int64(gfeLatency), metric.WithAttributes(attr...))
    }
    return nil
}

Expected behavior

Actual behavior

Additional context

e.g. Started after upgrading to v1.57.0 and using the EnableOpenTelemetryMetrics function as shown above

rahul2393 commented 7 months ago

Thanks @henripqt Its known and expected that sometimes response metadata header is missing, and below lines will fail if that's the case

if len(md.Get("server-timing")) == 0 && otConfig.gfeHeaderMissingCount != nil { // at this point otConfig.gfeHeaderMissingCount is nil
        otConfig.gfeHeaderMissingCount.Add(ctx, 1, metric.WithAttributes(attr...))
        return nil
    }

I have raised a PR which will initialize all the configs related to open telemetry by default irrespective of whether the Opentelemetry is enabled or not that will take care of this scenario

Thanks