open-telemetry / opentelemetry-go-instrumentation

OpenTelemetry Auto Instrumentation using eBPF
https://opentelemetry.io
Apache License 2.0
518 stars 79 forks source link

Reading the HTTP status code in server requests does not work with custom http.ResponseWriter's #472

Closed grcevski closed 10 months ago

grcevski commented 11 months ago

Describe the bug

A Go developer can write a custom http.ResponseWriter which does not have to have a status field and especially may also have a status field at a different offset.

Reading the non-existent status field will cause the HTTP status code to not be read or be read as random bytes.

Environment

To Reproduce

Here's a simplified example that shows a custom ResponseWriter:

type responseWriterWrapper struct {
    rw         http.ResponseWriter
    statusCode int
    ...
}

func newResponseWriterWrapper(rw http.ResponseWriter) *responseWriterWrapper {
    return &responseWriterWrapper{
        rw:         rw,
        statusCode: http.StatusOK,
    }
}

func (b *responseWriterWrapper) Header() http.Header {
    return b.rw.Header()
}

func (b *responseWriterWrapper) Write(data []byte) (int, error) {
    if b.statusCode == 0 {
        b.WriteHeader(http.StatusOK)
    }
    n, err := b.rw.Write(data)
        ...
    return n, err
}

func (b *responseWriterWrapper) WriteHeader(statusCode int) {
    b.statusCode = statusCode
    b.rw.WriteHeader(statusCode)
}

Expected behavior

Status code should be correctly read from this example, however there is no status field in the ResponseWriter.

Additional context

N/A

MrAlias commented 11 months ago

Plan is to look at replacing the ServeHTTP return uprobe with a uprobe for WriteHeader.

MrAlias commented 11 months ago

Given the ResponseWriter is an interface and we cannot instrument its methods directly, we will need to instrument its implementations. I think we are on the correct track currently instrumenting the response struct:

https://github.com/open-telemetry/opentelemetry-go-instrumentation/blob/998fabad1086c9df9149a41d8c4baf659711f7bc/internal/pkg/instrumentation/bpf/net/http/server/probe.go#L72

Though, instead of using the status field, we should create a uprobe for the WriteHeader method of that type. Given any wrapping of a ResponseWriter will ultimately need to call that method, I think that should solve this issue.

MrAlias commented 11 months ago

I'm not able to reproduce the failure here. Running the following as in the end-2-end fixture test still passes:

package main

import (
    "fmt"
    "io"
    "log"
    "net/http"
    "time"
)

type statusRecorder struct {
    rw     http.ResponseWriter
    status int
}

func (r *statusRecorder) Header() http.Header {
    return r.rw.Header()
}

func (r *statusRecorder) Write(data []byte) (int, error) {
    if r.status == 0 {
        r.WriteHeader(http.StatusOK)
    }
    return r.rw.Write(data)
}

func (r *statusRecorder) WriteHeader(code int) {
    r.status = code
    r.rw.WriteHeader(code)
}

func logStatus(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        rec := &statusRecorder{rw: w}

        next.ServeHTTP(rec, r)

        log.Printf("response status: %d\n", rec.status)
    })
}

func hello(w http.ResponseWriter, _ *http.Request) {
    fmt.Fprintf(w, "hello\n")
}

func main() {
    go func() {
        _ = http.ListenAndServe(":8080", logStatus(http.HandlerFunc(hello)))
    }()

    // give time for auto-instrumentation to start up
    time.Sleep(5 * time.Second)

    resp, err := http.Get("http://localhost:8080/hello")
    if err != nil {
        log.Fatal(err)
    }
    body, err := io.ReadAll(resp.Body)
    if err != nil {
        log.Fatal(err)
    }

    log.Printf("Body: %s\n", string(body))
    _ = resp.Body.Close()

    // give time for auto-instrumentation to report signal
    time.Sleep(5 * time.Second)
}

Output

$ make fixture-nethttp
make docker-build
make[1]: Entering directory '/home/tyler/go/src/go.opentelemetry.io/auto'
docker buildx build -t otel-go-instrumentation .
[+] Building 40.1s (15/15) FINISHED                                                                                                                            docker:default
 => [internal] load .dockerignore                                                                                                                                        0.0s
 => => transferring context: 2B                                                                                                                                          0.0s
 => [internal] load build definition from Dockerfile                                                                                                                     0.0s
 => => transferring dockerfile: 572B                                                                                                                                     0.0s
 => [internal] load metadata for gcr.io/distroless/base-debian12@sha256:1dfdb5ed7d9a66dcfc90135b25a46c25a85cf719b619b40c249a2445b9d055f5                                 0.0s
 => [internal] load metadata for docker.io/library/golang:1.21.4-bullseye                                                                                                0.4s
 => CACHED [stage-1 1/2] FROM gcr.io/distroless/base-debian12@sha256:1dfdb5ed7d9a66dcfc90135b25a46c25a85cf719b619b40c249a2445b9d055f5                                    0.0s
 => [builder 1/7] FROM docker.io/library/golang:1.21.4-bullseye@sha256:31848c4f02b08469e159ea1ee664a3f29602418b13e7d67dfd4560d169e14d55                                  0.0s
 => [internal] load build context                                                                                                                                        0.2s
 => => transferring context: 438.13kB                                                                                                                                    0.1s
 => CACHED [builder 2/7] WORKDIR /app                                                                                                                                    0.0s
 => CACHED [builder 3/7] RUN apt-get update && apt-get install -y curl clang gcc llvm make libbpf-dev                                                                    0.0s
 => CACHED [builder 4/7] COPY go.mod go.sum ./                                                                                                                           0.0s
 => CACHED [builder 5/7] RUN go mod download && go mod verify                                                                                                            0.0s
 => [builder 6/7] COPY . .                                                                                                                                               0.8s
 => [builder 7/7] RUN make build                                                                                                                                        37.9s
 => [stage-1 2/2] COPY --from=builder /app/otel-go-instrumentation /                                                                                                     0.1s
 => exporting to image                                                                                                                                                   0.4s
 => => exporting layers                                                                                                                                                  0.4s
 => => writing image sha256:269d25c50dc9009e3e1f731a08b578b35162c248a094bda674cb2c78e06f348a                                                                             0.0s
 => => naming to docker.io/library/otel-go-instrumentation                                                                                                               0.0s
make[1]: Leaving directory '/home/tyler/go/src/go.opentelemetry.io/auto'
cd internal/test/e2e/nethttp && docker build -t sample-app .
[+] Building 11.0s (9/9) FINISHED                                                                                                                              docker:default
 => [internal] load build definition from Dockerfile                                                                                                                     0.0s
 => => transferring dockerfile: 189B                                                                                                                                     0.0s
 => [internal] load .dockerignore                                                                                                                                        0.0s
 => => transferring context: 2B                                                                                                                                          0.0s
 => [internal] load metadata for docker.io/library/golang:1.21.4                                                                                                         0.0s
 => [1/4] FROM docker.io/library/golang:1.21.4                                                                                                                           0.0s
 => [internal] load build context                                                                                                                                        0.0s
 => => transferring context: 6.23kB                                                                                                                                      0.0s
 => CACHED [2/4] WORKDIR /sample-app                                                                                                                                     0.0s
 => [3/4] COPY . .                                                                                                                                                       0.0s
 => [4/4] RUN go mod init go.opentelemetry.io/auto/internal/test/e2e/nethttp && go mod tidy && go build -o main                                                          9.0s
 => exporting to image                                                                                                                                                   1.8s
 => => exporting layers                                                                                                                                                  1.8s
 => => writing image sha256:8e048aeba23d232195379424ed3d36f9f42fb1c65fa135f0d5f474b2a58c7046                                                                             0.0s
 => => naming to docker.io/library/sample-app                                                                                                                            0.0s
kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.27.3) đŸ–ŧ
 ✓ Preparing nodes đŸ“Ļ
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹ī¸
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a nice day! 👋
kind load docker-image otel-go-instrumentation sample-app
Image: "otel-go-instrumentation" with ID "sha256:269d25c50dc9009e3e1f731a08b578b35162c248a094bda674cb2c78e06f348a" not yet present on node "kind-control-plane", loading...
Image: "sample-app" with ID "sha256:8e048aeba23d232195379424ed3d36f9f42fb1c65fa135f0d5f474b2a58c7046" not yet present on node "kind-control-plane", loading...
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
"open-telemetry" already exists with the same configuration, skipping
if [ ! -d "opentelemetry-helm-charts" ]; then \
    git clone https://github.com/open-telemetry/opentelemetry-helm-charts.git; \
fi
helm install test -f .github/workflows/e2e/k8s/collector-helm-values.yml opentelemetry-helm-charts/opentelemetry-collector
NAME: test
LAST DEPLOYED: Wed Nov 15 12:57:50 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
[WARNING] No resource limits or requests were set. Consider setter resource requests and limits for your collector(s) via the `resources` field.
kubectl wait --for=condition=Ready --timeout=60s pod/test-opentelemetry-collector-0
pod/test-opentelemetry-collector-0 condition met
kubectl -n default create -f .github/workflows/e2e/k8s/sample-job.yml
job.batch/sample-job created
kubectl wait --for=condition=Complete --timeout=60s job/sample-job
job.batch/sample-job condition met
kubectl cp -c filecp default/test-opentelemetry-collector-0:tmp/trace.json ./internal/test/e2e/nethttp/traces-orig.json
rm -f ./internal/test/e2e/nethttp/traces.json
bats ./internal/test/e2e/nethttp/verify.bats
verify.bats
 ✓ go-auto :: includes service.name in resource attributes
 ✓ net/http :: emits a span name '{http.method}' (per semconv)
 ✓ net/http :: includes http.method attribute
 ✓ net/http :: includes http.target attribute
 ✓ net/http :: includes http.status_code attribute
 ✓ net/http :: trace ID present and valid in all spans
 ✓ net/http :: span ID present and valid in all spans
 ✓ net/http :: parent span ID present and valid in all spans
 ✓ net/http/client, net/http :: spans have same trace ID
 ✓ net/http/client, net/http :: server span has client span as parent
 ✓ net/http :: expected (redacted) trace output

11 tests, 0 failures

kind delete cluster
Deleting cluster "kind" ...
Deleted nodes: ["kind-control-plane"]
grcevski commented 11 months ago

Hm, let me check, maybe I have a mistake in my example.

grcevski commented 11 months ago

Sorry I had made an error while adapting the code for the testcase. I've modified your code as this:

package main

import (
    "fmt"
    "log"
    "net/http"
)

type statusRecorder struct {
    rw     http.ResponseWriter
    status int
    data   []byte
}

func (r *statusRecorder) Header() http.Header {
    return r.rw.Header()
}

func (r *statusRecorder) Write(data []byte) (int, error) {
    r.data = data
    return len(data), nil
}

func (r *statusRecorder) WriteHeader(code int) {
    r.status = code
}

func logStatus(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        rec := &statusRecorder{rw: w}

        next.ServeHTTP(rec, r)

        rec.rw.WriteHeader(rec.status)
        rec.rw.Write(rec.data)

        log.Printf("response status: %d\n", rec.status)
    })
}

func hello(w http.ResponseWriter, _ *http.Request) {
    w.WriteHeader(http.StatusOK)
    fmt.Fprintf(w, "hello\n")
}

func main() {
    go func() {
        _ = http.ListenAndServe(":8080", logStatus(http.HandlerFunc(hello)))
    }()

    // give time for auto-instrumentation to start up
    time.Sleep(5 * time.Second)

    resp, err := http.Get("http://localhost:8080/hello")
    if err != nil {
        log.Fatal(err)
    }
    body, err := io.ReadAll(resp.Body)
    if err != nil {
        log.Fatal(err)
    }

    log.Printf("Body: %s\n", string(body))
    _ = resp.Body.Close()

    // give time for auto-instrumentation to report signal
    time.Sleep(5 * time.Second)
}

I haven't had the time to learn how to make and run the tests yet, but I ran the otel-auto manually like this, after changing the main to simply do _ = http.ListenAndServe(":8080", logStatus(http.HandlerFunc(hello))) and running curl http://localhost:8080/hello:

sudo OTEL_GO_AUTO_TARGET_EXE=/home/nino/work/examples/go_hide_status/hide-status-server OTEL_SERVICE_NAME=aaaa ./otel-go-instrumentation 

And this is what I get:

  1. Target /hello is missing
  2. I don't see status at all in attributes
  3. You can see two requests like I mentioned in the other issue.
{"level":"info","ts":1700094316.3522575,"caller":"opentelemetry/controller.go:65","msg":"got event","attrs":[{"Key":"http.method","Value":{"Type":"STRING","Value":"GET"}},{"Key":"http.target","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1700094316.3524444,"caller":"opentelemetry/controller.go:65","msg":"got event","attrs":[{"Key":"http.method","Value":{"Type":"STRING","Value":""}},{"Key":"http.target","Value":{"Type":"STRING","Value":""}}]}
grcevski commented 11 months ago

I actually believe the fix for both these issues might be quite straighforward, all you need to do is put the uprobes on net/http.serverHandler.ServeHTTP instead of net/http.HandlerFunc.ServeHTTP. Unless the aim is to expose the inner handlers, but then I think we need to make sure they are reported nested and as 'internal' spans.

pellared commented 11 months ago

SIG meeting: @grcevski will work on this one.

grcevski commented 10 months ago

This was resolved by https://github.com/open-telemetry/opentelemetry-go-instrumentation/issues/505