jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.58k stars 2.45k forks source link

[Bug]: query: TLS handshake error #6230

Closed crystalgreen closed 3 days ago

crystalgreen commented 5 days ago

What happened?

Using v1.63.0 on Windows. When loading https://localhost:16686/ in the browser I get the error on the console where I run jaeger-all-in-one.exe

http: TLS handshake error from [::1]:55388: tls: first record does not look like a TLS handshake

I run with args:

--query.additional-headers="Access-Control-Allow-Origin: " --collector.otlp.enabled=true --collector.otlp.http.cors.allowed-origins="" --collector.otlp.http.cors.allowed-headers="*" --collector.otlp.http.host-port=4318 --collector.otlp.http.tls.key=%TLS%\key.pem --collector.otlp.http.tls.cert=%TLS%\cert.pem --collector.otlp.http.tls.enabled=true --query.http.tls.enabled=true --query.http.tls.key=%TLS%\key.pem --query.http.tls.cert=%TLS%\cert.pem

This works with v1.62.0.

Maybe this is related to PR #6023 I don't see an update to the CLI doc.

Steps to reproduce

  1. Run jaeger-all-in-one.exe on windows with args similar to above
  2. Open UI at default port in browser
  3. See described error in jaeger console

Expected behavior

It should work as before - the UI should load but it remains empty.

Relevant log output

http: TLS handshake error from [::1]:55388: tls: first record does not look like a TLS handshake

Screenshot

No response

Additional context

No response

Jaeger backend version

v1.63.0

SDK

No response

Pipeline

No response

Stogage backend

none (all-in-one)

Operating system

Windows

Deployment model

No response

Deployment configs

yurishkuro commented 5 days ago

@mahadzaryab1 can you take a look?

yurishkuro commented 4 days ago

@mahadzaryab1 I am able to reproduce, so it looks like it might be related to introducing OTEL helpers:

go run ./cmd/all-in-one \
    --query.http.tls.enabled=true \
    --query.http.tls.cert=./pkg/config/tlscfg/testdata/example-server-cert.pem \
    --query.http.tls.key=./pkg/config/tlscfg/testdata/example-server-key.pem

after the request the server logs a bunch of errors:

2024/11/21 23:27:00 http: TLS handshake error from [::1]:62373: client sent an HTTP request to an HTTPS server
2024/11/21 23:27:04 http: TLS handshake error from [::1]:62374: EOF
2024/11/21 23:27:05 http: TLS handshake error from [::1]:62375: remote error: tls: unknown certificate
2024/11/21 23:27:05 http: TLS handshake error from [::1]:62376: remote error: tls: unknown certificate
2024/11/21 23:27:13 http: TLS handshake error from [::1]:62377: remote error: tls: unknown certificate
2024/11/21 23:27:13 http: TLS handshake error from [::1]:62378: remote error: tls: unknown certificate
2024/11/21 23:27:13 http: TLS handshake error from [::1]:62379: tls: first record does not look like a TLS handshake
$ curl -v -k https://localhost:16686/
...
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://localhost:16686/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: localhost:16686]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [user-agent: curl/8.7.1]
* [HTTP/2] [1] [accept: */*]
> GET / HTTP/2
> Host: localhost:16686
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
* Closing connection
* Recv failure: Connection reset by peer
* Send failure: Broken pipe
curl: (16) Recv failure: Connection reset by peer
yurishkuro commented 4 days ago

@mahadzaryab1 I think this is the mistake: https://github.com/jaegertracing/jaeger/blob/adbdf267a584e243dbcf5006844532eacc9bba00/cmd/query/app/server.go#L331-L335

The listener is already upgraded to TLS by ToListener() function. If I replace it with only s.httpServer.Serve(s.httpConn), then the connection works, but ONLY with http/1.1:

$ curl -k --http1.1 https://localhost:16686/ > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4454    0  4454    0     0   893k      0 --:--:-- --:--:-- --:--:-- 1087k

So there's still something funky going on, as by default curl tries to use http/2 and the server doesn't like it. Meanwhile, OTEL's test do appear to be testing both 2 and 1 https://github.com/open-telemetry/opentelemetry-collector/blob/00ad49af88794a8ad03d9994d290d1dea410ede2/config/confighttp/confighttp_test.go#L681

mahadzaryab1 commented 3 days ago

@yurishkuro Thanks for the reproduction steps. I've got a patch at https://github.com/jaegertracing/jaeger/pull/6239. I'm still unsure as to why the unit tests weren't able to catch this and will see if I can come up with a regression test for this.

yurishkuro commented 3 days ago

Let's keep this open for a bit until we find out why the tests did not catch this.

One suspicious thing I see is here https://github.com/jaegertracing/jaeger/blob/edb896e9fed3a578045a8e1a5da968336f6b6f48/cmd/query/app/server_test.go#L452

The HTTP/TLS tests are only run when ClientCA is present. If I remove that 2nd clause, the tests still pass - but both before and after your fix, so they are still not catching the issue. But maybe we have other similar filters.

It's also not clear to me why we are doing manual dial outs in this block https://github.com/jaegertracing/jaeger/blob/edb896e9fed3a578045a8e1a5da968336f6b6f48/cmd/query/app/server_test.go#L422