open-telemetry / opentelemetry-ruby

OpenTelemetry Ruby API & SDK, and related gems
https://opentelemetry.io/
Apache License 2.0
486 stars 239 forks source link

Max packet size not correctly managed (at least) in OpenTelemetry::Exporter::Jaeger::AgentExporter #1148

Closed thomasLeclaire closed 5 months ago

thomasLeclaire commented 2 years ago

Description of the bug

The max_packet_size is not correctly managed (at least) in OpenTelemetry::Exporter::Jaeger::AgentExporter, it actually correspond to the real content of data when we expect it is the real size of the packet sent. It is a problem when sending some big size packets with the default max packet value of 65000 like we will concretely send some packet which are more than 65k size, but the jaeger-agent don't read more than 65k, which result into unexpected EOF errors in jager-agent logs.

Share details about your runtime

Operating system details: Ubuntu but no importance here RUBY_ENGINE: "ruby" RUBY_VERSION: "3" Thrift : v0.16.0

Share a simplified reproduction if possible Set an explicit "small" max packet size, send span, and check the size of packet received in jeager-agent debug logs.

I'm really not Ruby dev and was not able to identify the root cause but from what I understand, we consider the size of the content of message but without considering the "headers" of the Thrift protocol which results in these added bytes. cf batcher function here : https://github.com/open-telemetry/opentelemetry-ruby/blob/main/exporter/jaeger/lib/opentelemetry/exporter/jaeger/agent_exporter.rb

A quick and easy "fix" could be to change the default value from 65000 (here : https://github.com/open-telemetry/opentelemetry-ruby/blob/06b36a646afb30e86ad426f1be2bc97ed269fa15/exporter/jaeger/lib/opentelemetry/exporter/jaeger/agent_exporter.rb#L19) to something smallest like 64000 here but it's not really the best solution I think!

github-actions[bot] commented 6 months ago

👋 This issue has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this issue will be closed eventually by the stale bot.