open-telemetry / opentelemetry-cpp-contrib

https://opentelemetry.io/
Apache License 2.0
125 stars 140 forks source link

Nginx instrumented with CPP otel contrib rejects requests without User-Agent field. #474

Open skowront opened 2 months ago

skowront commented 2 months ago

Situation A NGINX 1.26.x or 1.25.x and NO OTEL cpp-contrib added. Requests made with curl, python httpclient are accepted. Requests made with dotnet 8.0 httpclient are acceepted.

Situation B NGINX 1.26.x or 1.25.x and OTEL cpp-contrib added. Requests made with curl, python httpclient are accepted. Requests made with dotnet 8.0 httpclient are REJECTED.

The following log is produced by .net:

An error occurred while sending the request.

The response ended prematurely. (ResponseEnded)

at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)

at System.Net.Http.HttpConnection.d57.MoveNext() in /_/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs:line 862 at System.Net.Http.HttpConnectionPool.d_89.MoveNext() in //src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs:line 1116 at System.Threading.Tasks.ValueTask`1.getResult() in //src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/ValueTask.cs:line 812 at System.Net.Http.RedirectHandler.d4.MoveNext() in /_/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/RedirectHandler.cs:line 30 at System.Net.Http.HttpClient.d_41.MoveNext() in //src/libraries/System.Net.Http/src/System/Net/Http/HttpClient.cs:line 188 at CSOTel.Traffic.CLI.Program.

d__1.MoveNext() in C:\Users\tomek\source\repos\CSOTel\CSOTel.Traffic.CLI\Program.cs:line 35

The following is produced by nginx with otel cpp contrib:

2024/08/26 19:25:39 [error] 49#49: 10 mod_opentelemetry: startMonitoringRequest: Starting Request Monitoring for: / HTTP/1.1 Host, client: 10.0.2.2, server: www.cso.lab, request: "GET / HTTP/1.1", host: "cso.lab" 2024/08/26 19:25:39 [error] 49#49: 10 mod_opentelemetry: startMonitoringRequest: WebServer Context: NginxWebServerNetworkCSOTel.NginxWebServerNginxId, client: 10.0.2.2, server: www.cso.lab, request: "GET / HTTP/1.1", host: "cso.lab" 2024/08/26 19:25:39 [alert] 1#1: worker process 49 exited on signal 11 (core dumped)

While curl works perfecly fine and nginx serves the request. The problem is that curl automatically adds a user-agent header, but dotnet httpclient doesn't - and why should it?

curl -vk https://cso.lab/

  • Trying 192.168.56.1:443...
  • Connected to cso.lab (192.168.56.1) port 443
  • ALPN: curl offers h2,http/1.1
  • TLSv1.3 (OUT), TLS handshake, Client hello (1):
  • TLSv1.3 (IN), TLS handshake, Server hello (2):
  • TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
  • TLSv1.3 (IN), TLS handshake, Certificate (11):
  • TLSv1.3 (IN), TLS handshake, CERT verify (15):
  • TLSv1.3 (IN), TLS handshake, Finished (20):
  • TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
  • TLSv1.3 (OUT), TLS handshake, Finished (20):
  • SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
  • ALPN: server accepted http/1.1
  • Server certificate:
  • subject: CN=cso.lab
  • start date: Aug 17 11:53:50 2024 GMT
  • expire date: Aug 15 11:53:50 2034 GMT
  • issuer: CN=cso.lab
  • SSL certificate verify result: self-signed certificate (18), continuing anyway.
  • using HTTP/1.1

    GET / HTTP/1.1 Host: cso.lab User-Agent: curl/8.4.0 Accept: /

  • TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
  • TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
  • old SSL session ID is stale, removing < HTTP/1.1 200 OK < Server: nginx/1.26.0 < Date: Mon, 26 Aug 2024 19:26:36 GMT < Content-Type: text/html < Content-Length: 2408 < Connection: keep-alive < Last-Modified: Sun, 25 Aug 2024 15:13:43 GMT < ETag: "66cb4a27-968" < Accept-Ranges: bytes < <!DOCTYPE html>

FIX/SOLUTION/WORKAROUND Workaround is to add User-Agent header to dotnet httpclient (any value works), but the key must be present. Otherwise the nignx will reject the request.

NOTE This happens ONLY when nginx is instrumented with this cpp-contrib library! So it's clearly an issue with this solution - probably some kind of null exception is thrown underneath and even no TRACE is being sent to OTEL collector, because the worker thread is automatically killed.

fede843 commented 1 month ago

We have experienced the same. Any request without the user-agent header set is rejected. Even worst, it kills the worker process. It is easy enough to produce a DoS by miss configuring with this.