slashmo / swift-otel

OpenTelemetry client built for server-side Swift
https://swiftpackageindex.com/slashmo/swift-otel
Apache License 2.0
71 stars 25 forks source link

HTTP Support? GRPC fails to write to a remote endpoint #129

Open Andrewangeta opened 3 months ago

Andrewangeta commented 3 months ago

I'm curious as to what work would need to be done to support HTTP as an additional transport mechanism to GRPC.

Currently I'm setting up otel on iOS and using GRPC fails when sending to a remote otel-collector. funny enough localhost always seems to resolve and push telemetry data but GRPC times out constantly on macOS, iOS and in a swift container image running in docker locally on macOS.

I've isolated the issue to being something specific to GRPC in swift. There's a tool I used to test called otel-cli to verify if the issue was my remote endpoint or not.

Running the following command using the cli on macOS otel-cli exec --service my-service --name "curl google" curl https://google.com --endpoint otel.example.com:443 results in the telemetry data being pushed to the otel-collector which then forwards it to grafana cloud via otlp/http and I can see the data in my dashboard

Screenshot 2024-08-02 at 11 31 28 AM

Debugging

I've enabled some logging for grpc in iOS and Hummingbird and see the following

Hummingbird in Xcode Logs

2024-07-30T08:03:44-0400 debug otel.test : grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 [GRPC] making client bootstrap with event loop group of type SelectableEventLoop
2024-07-30T08:03:44-0400 debug otel.test : grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 [GRPC] Network.framework is available but the EventLoopGroup is not compatible with NIOTS, falling back to ClientBootstrap
2024-07-30T08:03:44-0400 debug otel.test : grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 [GRPC] creating a ClientBootstrap
2024-07-30T08:03:44-0400 debug otel.test : connectivity_state=connecting grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 [GRPC] activating connection
2024-07-30T08:03:44-0400 debug otel.test : connectivity_state=active grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 [GRPC] deactivating connection
2024-07-30T08:03:44-0400 debug otel.test : delay_secs=60.74236138655257 grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 [GRPC] scheduling connection attempt
2024-07-30T08:03:44-0400 debug otel.test : connectivity_state=transientFailure grpc_connection_id=AD875410-B561-4CAD-909C-035E4FCC6668/8 [GRPC] vending multiplexer future
2024-07-30T08:03:51-0400 debug otel.test : grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 new_state=transientFailure old_state=connecting [GRPC] connectivity state change
2024-07-30T08:03:51-0400 trace otel.test : call_state=awaitingTransport (0 parts buffered) grpc_connection_id=AD875410-B561-4CAD-909C-035E4FCC6668/8 grpc_request_id=A7BD726C-3BDA-4121-A436-5D6342722E3C request_part=metadata [GRPC] buffering request part
2024-07-30T08:03:51-0400 trace otel.test : call_state=awaitingTransport (1 parts buffered) grpc_connection_id=AD875410-B561-4CAD-909C-035E4FCC6668/8 grpc_request_id=A7BD726C-3BDA-4121-A436-5D6342722E3C request_part=message [GRPC] buffering request part
2024-07-30T08:03:51-0400 trace otel.test : call_state=awaitingTransport (2 parts buffered) grpc_connection_id=AD875410-B561-4CAD-909C-035E4FCC6668/8 grpc_request_id=A7BD726C-3BDA-4121-A436-5D6342722E3C request_part=end [GRPC] buffering request part
2024-07-30T08:03:59-0400 trace otel.test : call_state=closing grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 grpc_request_id=7861235F-17F2-49D9-8BEE-CC4B86510449 [GRPC] failing buffered writes

iOS Logs

2024-07-30T08:15:08-0400 debug telemetry : grpc_connection_id=2FF99C64-1D91-41EA-A7FD-CCB950DAA7C2/8 [GRPC] creating a ClientBootstrap
2024-07-30T08:15:08-0400 error telemetry : error=read(descriptor:pointer:size:): Connection reset by peer (errno: 54) grpc.conn.addr_local=192.168.1.124 grpc.conn.addr_remote=***.***.***.** grpc_connection_id=2FF99C64-1D91-41EA-A7FD-CCB950DAA7C2/8 [GRPC] grpc client error

Hummingbird Container Logs macOS

2024-08-01T00:04:33+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 new_state=connecting old_state=idle [GRPC] connectivity state change
2024-08-01T00:04:33+0000 debug otel.test : connectivity_state=connecting grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 [GRPC] vending multiplexer future
2024-08-01T00:04:33+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 [GRPC] making client bootstrap with event loop group of type SelectableEventLoop
2024-08-01T00:04:33+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 [GRPC] creating a ClientBootstrap
2024-08-01T00:04:33+0000 trace otel.test : call_state=awaitingTransport (0 parts buffered) grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 grpc_request_id=BC73C809-747F-4A9B-B4D8-DEDB9BA54440 request_part=metadata [GRPC] buffering request part
2024-08-01T00:04:33+0000 trace otel.test : call_state=awaitingTransport (1 parts buffered) grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 grpc_request_id=BC73C809-747F-4A9B-B4D8-DEDB9BA54440 request_part=message [GRPC] buffering request part
2024-08-01T00:04:33+0000 trace otel.test : call_state=awaitingTransport (2 parts buffered) grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 grpc_request_id=BC73C809-747F-4A9B-B4D8-DEDB9BA54440 request_part=end [GRPC] buffering request part
2024-08-01T00:04:33+0000 debug otel.test : connectivity_state=connecting grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 [GRPC] activating connection
2024-08-01T00:04:33+0000 debug otel.test : connectivity_state=active grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 [GRPC] deactivating connection
2024-08-01T00:04:33+0000 debug otel.test : delay_secs=1.0 grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 [GRPC] scheduling connection attempt
2024-08-01T00:04:33+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 new_state=transientFailure old_state=connecting [GRPC] connectivity state change
2024-08-01T00:04:34+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/1 new_state=connecting old_state=transientFailure [GRPC] connectivity state change
2024-08-01T00:04:34+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/1 [GRPC] making client bootstrap with event loop group of type SelectableEventLoop
2024-08-01T00:04:34+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/1 [GRPC] creating a ClientBootstrap
2024-08-01T00:04:34+0000 debug otel.test : connectivity_state=connecting grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/1 [GRPC] activating connection
2024-08-01T00:04:34+0000 debug otel.test : connectivity_state=active grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/1 [GRPC] deactivating connection

Leveraging HTTP support would help alleviate these issues since it's a valid transport mechanism. Not sure if anyone else has experienced these issues with sending OLTP data over GRPC to a non localhost endpoint.

Andrewangeta commented 3 months ago

Could this be related to #26 ?

slashmo commented 2 months ago

Thanks for bringing this up @Andrewangeta 🙏 The issues you're facing should indeed be caused by our current gRPC support being limited to insecure channels. Have you already tried using the fixes made in #130? Would you still want/need HTTP support in swift-otel even if a secure gRPC connection does the trick for you?

Andrewangeta commented 2 months ago

@slashmo

Yea the ClientConfiguration never took into account a secure endpoint. I went to the exporters OTLPGRPCLogEntryExporter OTLPGRPCMetricExporter OTLPGRPCSpanExporter and altered the config setup

+var connectionConfiguration = ClientConnection.Configuration(target: .host(configuration.endpoint.host, port: configuration.endpoint.port),
+                                                             eventLoopGroup: group,
+                                                             tls: .init(configuration: .clientDefault),
+                                                             backgroundActivityLogger: requestLogger)

+if configuration.endpoint.isInsecure {
+    connectionConfiguration = ClientConnection.Configuration.default(
+        target: .host(configuration.endpoint.host, port: configuration.endpoint.port),
+        eventLoopGroup: group
+    )
+}

-var connectionConfiguration = ClientConnection.Configuration.default(
-     target: .host(configuration.endpoint.host, port: configuration.endpoint.port),
-     eventLoopGroup: group
- )

As far as needing HTTP, I suppose someone could have a valid usecase. Currently grafana cloud only accepts HTTP for delivering OTLP to ingest. So someone who opts out of using an otel collector (that can forward HTTP requests) for whatever reason and want a HTTP ingestor/endpoint HTTP seems like a sensible option someone can choose. I'm using an otel collector for my needs so I can get away with GRPC currently.

slashmo commented 2 months ago

Currently grafana cloud only accepts HTTP for delivering OTLP to ingest. @Andrewangeta

Interesting! We could add http/protobuf support in a separate OTLPHTTP target. I prototyped this a while ago, including an OTLP target which brings together both OTLPGRPC and OTLPHTTP to support the OTEL_EXPORTER_OTLP_PROTOCOL variable for dynamically choosing between one or the other.