slashmo / swift-otel

OpenTelemetry client built for server-side Swift
https://swiftpackageindex.com/slashmo/swift-otel
Apache License 2.0
64 stars 18 forks source link

HTTP Support? GRPC fails to write to a remote endpoint #129

Open Andrewangeta opened 1 month ago

Andrewangeta commented 1 month ago

I'm curious as to what work would need to be done to support HTTP as an additional transport mechanism to GRPC.

Currently I'm setting up otel on iOS and using GRPC fails when sending to a remote otel-collector. funny enough localhost always seems to resolve and push telemetry data but GRPC times out constantly on macOS, iOS and in a swift container image running in docker locally on macOS.

I've isolated the issue to being something specific to GRPC in swift. There's a tool I used to test called otel-cli to verify if the issue was my remote endpoint or not.

Running the following command using the cli on macOS otel-cli exec --service my-service --name "curl google" curl https://google.com --endpoint otel.example.com:443 results in the telemetry data being pushed to the otel-collector which then forwards it to grafana cloud via otlp/http and I can see the data in my dashboard

Screenshot 2024-08-02 at 11 31 28 AM

Debugging

I've enabled some logging for grpc in iOS and Hummingbird and see the following

Hummingbird in Xcode Logs

2024-07-30T08:03:44-0400 debug otel.test : grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 [GRPC] making client bootstrap with event loop group of type SelectableEventLoop
2024-07-30T08:03:44-0400 debug otel.test : grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 [GRPC] Network.framework is available but the EventLoopGroup is not compatible with NIOTS, falling back to ClientBootstrap
2024-07-30T08:03:44-0400 debug otel.test : grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 [GRPC] creating a ClientBootstrap
2024-07-30T08:03:44-0400 debug otel.test : connectivity_state=connecting grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 [GRPC] activating connection
2024-07-30T08:03:44-0400 debug otel.test : connectivity_state=active grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 [GRPC] deactivating connection
2024-07-30T08:03:44-0400 debug otel.test : delay_secs=60.74236138655257 grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 [GRPC] scheduling connection attempt
2024-07-30T08:03:44-0400 debug otel.test : connectivity_state=transientFailure grpc_connection_id=AD875410-B561-4CAD-909C-035E4FCC6668/8 [GRPC] vending multiplexer future
2024-07-30T08:03:51-0400 debug otel.test : grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 new_state=transientFailure old_state=connecting [GRPC] connectivity state change
2024-07-30T08:03:51-0400 trace otel.test : call_state=awaitingTransport (0 parts buffered) grpc_connection_id=AD875410-B561-4CAD-909C-035E4FCC6668/8 grpc_request_id=A7BD726C-3BDA-4121-A436-5D6342722E3C request_part=metadata [GRPC] buffering request part
2024-07-30T08:03:51-0400 trace otel.test : call_state=awaitingTransport (1 parts buffered) grpc_connection_id=AD875410-B561-4CAD-909C-035E4FCC6668/8 grpc_request_id=A7BD726C-3BDA-4121-A436-5D6342722E3C request_part=message [GRPC] buffering request part
2024-07-30T08:03:51-0400 trace otel.test : call_state=awaitingTransport (2 parts buffered) grpc_connection_id=AD875410-B561-4CAD-909C-035E4FCC6668/8 grpc_request_id=A7BD726C-3BDA-4121-A436-5D6342722E3C request_part=end [GRPC] buffering request part
2024-07-30T08:03:59-0400 trace otel.test : call_state=closing grpc_connection_id=8A6BF2D3-29EC-4F90-B8AA-315DCF757F52/9 grpc_request_id=7861235F-17F2-49D9-8BEE-CC4B86510449 [GRPC] failing buffered writes

iOS Logs

2024-07-30T08:15:08-0400 debug telemetry : grpc_connection_id=2FF99C64-1D91-41EA-A7FD-CCB950DAA7C2/8 [GRPC] creating a ClientBootstrap
2024-07-30T08:15:08-0400 error telemetry : error=read(descriptor:pointer:size:): Connection reset by peer (errno: 54) grpc.conn.addr_local=192.168.1.124 grpc.conn.addr_remote=***.***.***.** grpc_connection_id=2FF99C64-1D91-41EA-A7FD-CCB950DAA7C2/8 [GRPC] grpc client error

Hummingbird Container Logs macOS

2024-08-01T00:04:33+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 new_state=connecting old_state=idle [GRPC] connectivity state change
2024-08-01T00:04:33+0000 debug otel.test : connectivity_state=connecting grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 [GRPC] vending multiplexer future
2024-08-01T00:04:33+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 [GRPC] making client bootstrap with event loop group of type SelectableEventLoop
2024-08-01T00:04:33+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 [GRPC] creating a ClientBootstrap
2024-08-01T00:04:33+0000 trace otel.test : call_state=awaitingTransport (0 parts buffered) grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 grpc_request_id=BC73C809-747F-4A9B-B4D8-DEDB9BA54440 request_part=metadata [GRPC] buffering request part
2024-08-01T00:04:33+0000 trace otel.test : call_state=awaitingTransport (1 parts buffered) grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 grpc_request_id=BC73C809-747F-4A9B-B4D8-DEDB9BA54440 request_part=message [GRPC] buffering request part
2024-08-01T00:04:33+0000 trace otel.test : call_state=awaitingTransport (2 parts buffered) grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 grpc_request_id=BC73C809-747F-4A9B-B4D8-DEDB9BA54440 request_part=end [GRPC] buffering request part
2024-08-01T00:04:33+0000 debug otel.test : connectivity_state=connecting grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 [GRPC] activating connection
2024-08-01T00:04:33+0000 debug otel.test : connectivity_state=active grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 [GRPC] deactivating connection
2024-08-01T00:04:33+0000 debug otel.test : delay_secs=1.0 grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 [GRPC] scheduling connection attempt
2024-08-01T00:04:33+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/0 new_state=transientFailure old_state=connecting [GRPC] connectivity state change
2024-08-01T00:04:34+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/1 new_state=connecting old_state=transientFailure [GRPC] connectivity state change
2024-08-01T00:04:34+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/1 [GRPC] making client bootstrap with event loop group of type SelectableEventLoop
2024-08-01T00:04:34+0000 debug otel.test : grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/1 [GRPC] creating a ClientBootstrap
2024-08-01T00:04:34+0000 debug otel.test : connectivity_state=connecting grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/1 [GRPC] activating connection
2024-08-01T00:04:34+0000 debug otel.test : connectivity_state=active grpc_connection_id=A2C48434-73F1-4E6D-AA73-CCE2A6CEB791/1 [GRPC] deactivating connection

Leveraging HTTP support would help alleviate these issues since it's a valid transport mechanism. Not sure if anyone else has experienced these issues with sending OLTP data over GRPC to a non localhost endpoint.

Andrewangeta commented 1 month ago

Could this be related to #26 ?