Closed grcevski closed 10 months ago
Plan is to look at replacing the ServeHTTP return uprobe with a uprobe for WriteHeader.
Given the ResponseWriter
is an interface and we cannot instrument its methods directly, we will need to instrument its implementations. I think we are on the correct track currently instrumenting the response
struct:
Though, instead of using the status
field, we should create a uprobe for the WriteHeader
method of that type. Given any wrapping of a ResponseWriter
will ultimately need to call that method, I think that should solve this issue.
I'm not able to reproduce the failure here. Running the following as in the end-2-end fixture test still passes:
package main
import (
"fmt"
"io"
"log"
"net/http"
"time"
)
type statusRecorder struct {
rw http.ResponseWriter
status int
}
func (r *statusRecorder) Header() http.Header {
return r.rw.Header()
}
func (r *statusRecorder) Write(data []byte) (int, error) {
if r.status == 0 {
r.WriteHeader(http.StatusOK)
}
return r.rw.Write(data)
}
func (r *statusRecorder) WriteHeader(code int) {
r.status = code
r.rw.WriteHeader(code)
}
func logStatus(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
rec := &statusRecorder{rw: w}
next.ServeHTTP(rec, r)
log.Printf("response status: %d\n", rec.status)
})
}
func hello(w http.ResponseWriter, _ *http.Request) {
fmt.Fprintf(w, "hello\n")
}
func main() {
go func() {
_ = http.ListenAndServe(":8080", logStatus(http.HandlerFunc(hello)))
}()
// give time for auto-instrumentation to start up
time.Sleep(5 * time.Second)
resp, err := http.Get("http://localhost:8080/hello")
if err != nil {
log.Fatal(err)
}
body, err := io.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)
}
log.Printf("Body: %s\n", string(body))
_ = resp.Body.Close()
// give time for auto-instrumentation to report signal
time.Sleep(5 * time.Second)
}
Output
$ make fixture-nethttp
make docker-build
make[1]: Entering directory '/home/tyler/go/src/go.opentelemetry.io/auto'
docker buildx build -t otel-go-instrumentation .
[+] Building 40.1s (15/15) FINISHED docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 572B 0.0s
=> [internal] load metadata for gcr.io/distroless/base-debian12@sha256:1dfdb5ed7d9a66dcfc90135b25a46c25a85cf719b619b40c249a2445b9d055f5 0.0s
=> [internal] load metadata for docker.io/library/golang:1.21.4-bullseye 0.4s
=> CACHED [stage-1 1/2] FROM gcr.io/distroless/base-debian12@sha256:1dfdb5ed7d9a66dcfc90135b25a46c25a85cf719b619b40c249a2445b9d055f5 0.0s
=> [builder 1/7] FROM docker.io/library/golang:1.21.4-bullseye@sha256:31848c4f02b08469e159ea1ee664a3f29602418b13e7d67dfd4560d169e14d55 0.0s
=> [internal] load build context 0.2s
=> => transferring context: 438.13kB 0.1s
=> CACHED [builder 2/7] WORKDIR /app 0.0s
=> CACHED [builder 3/7] RUN apt-get update && apt-get install -y curl clang gcc llvm make libbpf-dev 0.0s
=> CACHED [builder 4/7] COPY go.mod go.sum ./ 0.0s
=> CACHED [builder 5/7] RUN go mod download && go mod verify 0.0s
=> [builder 6/7] COPY . . 0.8s
=> [builder 7/7] RUN make build 37.9s
=> [stage-1 2/2] COPY --from=builder /app/otel-go-instrumentation / 0.1s
=> exporting to image 0.4s
=> => exporting layers 0.4s
=> => writing image sha256:269d25c50dc9009e3e1f731a08b578b35162c248a094bda674cb2c78e06f348a 0.0s
=> => naming to docker.io/library/otel-go-instrumentation 0.0s
make[1]: Leaving directory '/home/tyler/go/src/go.opentelemetry.io/auto'
cd internal/test/e2e/nethttp && docker build -t sample-app .
[+] Building 11.0s (9/9) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 189B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/golang:1.21.4 0.0s
=> [1/4] FROM docker.io/library/golang:1.21.4 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 6.23kB 0.0s
=> CACHED [2/4] WORKDIR /sample-app 0.0s
=> [3/4] COPY . . 0.0s
=> [4/4] RUN go mod init go.opentelemetry.io/auto/internal/test/e2e/nethttp && go mod tidy && go build -o main 9.0s
=> exporting to image 1.8s
=> => exporting layers 1.8s
=> => writing image sha256:8e048aeba23d232195379424ed3d36f9f42fb1c65fa135f0d5f474b2a58c7046 0.0s
=> => naming to docker.io/library/sample-app 0.0s
kind create cluster
Creating cluster "kind" ...
â Ensuring node image (kindest/node:v1.27.3) đŧ
â Preparing nodes đĻ
â Writing configuration đ
â Starting control-plane đšī¸
â Installing CNI đ
â Installing StorageClass đž
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Have a nice day! đ
kind load docker-image otel-go-instrumentation sample-app
Image: "otel-go-instrumentation" with ID "sha256:269d25c50dc9009e3e1f731a08b578b35162c248a094bda674cb2c78e06f348a" not yet present on node "kind-control-plane", loading...
Image: "sample-app" with ID "sha256:8e048aeba23d232195379424ed3d36f9f42fb1c65fa135f0d5f474b2a58c7046" not yet present on node "kind-control-plane", loading...
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
"open-telemetry" already exists with the same configuration, skipping
if [ ! -d "opentelemetry-helm-charts" ]; then \
git clone https://github.com/open-telemetry/opentelemetry-helm-charts.git; \
fi
helm install test -f .github/workflows/e2e/k8s/collector-helm-values.yml opentelemetry-helm-charts/opentelemetry-collector
NAME: test
LAST DEPLOYED: Wed Nov 15 12:57:50 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
[WARNING] No resource limits or requests were set. Consider setter resource requests and limits for your collector(s) via the `resources` field.
kubectl wait --for=condition=Ready --timeout=60s pod/test-opentelemetry-collector-0
pod/test-opentelemetry-collector-0 condition met
kubectl -n default create -f .github/workflows/e2e/k8s/sample-job.yml
job.batch/sample-job created
kubectl wait --for=condition=Complete --timeout=60s job/sample-job
job.batch/sample-job condition met
kubectl cp -c filecp default/test-opentelemetry-collector-0:tmp/trace.json ./internal/test/e2e/nethttp/traces-orig.json
rm -f ./internal/test/e2e/nethttp/traces.json
bats ./internal/test/e2e/nethttp/verify.bats
verify.bats
â go-auto :: includes service.name in resource attributes
â net/http :: emits a span name '{http.method}' (per semconv)
â net/http :: includes http.method attribute
â net/http :: includes http.target attribute
â net/http :: includes http.status_code attribute
â net/http :: trace ID present and valid in all spans
â net/http :: span ID present and valid in all spans
â net/http :: parent span ID present and valid in all spans
â net/http/client, net/http :: spans have same trace ID
â net/http/client, net/http :: server span has client span as parent
â net/http :: expected (redacted) trace output
11 tests, 0 failures
kind delete cluster
Deleting cluster "kind" ...
Deleted nodes: ["kind-control-plane"]
Hm, let me check, maybe I have a mistake in my example.
Sorry I had made an error while adapting the code for the testcase. I've modified your code as this:
package main
import (
"fmt"
"log"
"net/http"
)
type statusRecorder struct {
rw http.ResponseWriter
status int
data []byte
}
func (r *statusRecorder) Header() http.Header {
return r.rw.Header()
}
func (r *statusRecorder) Write(data []byte) (int, error) {
r.data = data
return len(data), nil
}
func (r *statusRecorder) WriteHeader(code int) {
r.status = code
}
func logStatus(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
rec := &statusRecorder{rw: w}
next.ServeHTTP(rec, r)
rec.rw.WriteHeader(rec.status)
rec.rw.Write(rec.data)
log.Printf("response status: %d\n", rec.status)
})
}
func hello(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusOK)
fmt.Fprintf(w, "hello\n")
}
func main() {
go func() {
_ = http.ListenAndServe(":8080", logStatus(http.HandlerFunc(hello)))
}()
// give time for auto-instrumentation to start up
time.Sleep(5 * time.Second)
resp, err := http.Get("http://localhost:8080/hello")
if err != nil {
log.Fatal(err)
}
body, err := io.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)
}
log.Printf("Body: %s\n", string(body))
_ = resp.Body.Close()
// give time for auto-instrumentation to report signal
time.Sleep(5 * time.Second)
}
I haven't had the time to learn how to make and run the tests yet, but I ran the otel-auto manually like this, after changing the main to simply do _ = http.ListenAndServe(":8080", logStatus(http.HandlerFunc(hello)))
and running curl http://localhost:8080/hello
:
sudo OTEL_GO_AUTO_TARGET_EXE=/home/nino/work/examples/go_hide_status/hide-status-server OTEL_SERVICE_NAME=aaaa ./otel-go-instrumentation
And this is what I get:
/hello
is missing {"level":"info","ts":1700094316.3522575,"caller":"opentelemetry/controller.go:65","msg":"got event","attrs":[{"Key":"http.method","Value":{"Type":"STRING","Value":"GET"}},{"Key":"http.target","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1700094316.3524444,"caller":"opentelemetry/controller.go:65","msg":"got event","attrs":[{"Key":"http.method","Value":{"Type":"STRING","Value":""}},{"Key":"http.target","Value":{"Type":"STRING","Value":""}}]}
I actually believe the fix for both these issues might be quite straighforward, all you need to do is put the uprobes on net/http.serverHandler.ServeHTTP
instead of net/http.HandlerFunc.ServeHTTP
. Unless the aim is to expose the inner handlers, but then I think we need to make sure they are reported nested and as 'internal' spans.
SIG meeting: @grcevski will work on this one.
This was resolved by https://github.com/open-telemetry/opentelemetry-go-instrumentation/issues/505
Describe the bug
A Go developer can write a custom http.ResponseWriter which does not have to have a status field and especially may also have a status field at a different offset.
Reading the non-existent status field will cause the HTTP status code to not be read or be read as random bytes.
Environment
To Reproduce
Here's a simplified example that shows a custom ResponseWriter:
Expected behavior
Status code should be correctly read from this example, however there is no
status
field in the ResponseWriter.Additional context
N/A