Closed mizhexiaoxiao closed 6 months ago
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
For some reason, the receiving traffic in otel-collector is significantly lower than the sending traffic in otel-agent.
How significant is this? Are you looking at the collector's internal metrics to arrive at this conclusion? Does the combined sent rate of exporting agents not match receiving rate of the combined receive rate of receiver collectors?
How do you currently measure traffic?
After looking into it, I couldn't find any specific metrics related to network traffic within the otel-collector. If I missed something, please feel free to correct me. However, I can confirm that the sending rate of otel-agent is matching the receiving rate of otel-collector, with all agents send about as many spans as the collector receives.
The following is the number of spans received by all agents and the number of spans sent by collectors at the same time
To measure the traffic, we've been using the Prometheus node-exporter's container_network_receive_packets_total and container_network_transmit_packets_total metrics. Based on these metrics, we've observed that the receiving traffic in otel-collector is significantly lower than the sending traffic in otel-agent.
This is just a comparison between a single agent and a collector. In fact, the total sending traffic of the agent is 40 times the receiving traffic of the collector. @atoulme @srikanthccv
OK. Do you have a set up that would allow us to reproduce?
@atoulme Yes, the configuration we are using is as follows
image: otel/opentelemetry-collector-contrib:0.73.0
otel agent config
receivers:
jaeger:
protocols:
thrift_compact:
endpoint: 0.0.0.0:6831
queue_size: 5_000
max_packet_size: 131_072
workers: 50
socket_buffer_size: 8_388_608
thrift_binary:
endpoint: 0.0.0.0:6832
queue_size: 5_000
max_packet_size: 131_072
workers: 50
socket_buffer_size: 8_388_608
zipkin:
exporters:
loadbalancing:
protocol:
otlp:
timeout: 1s
tls:
insecure: true
resolver:
static:
hostnames:
- otel-collector-0.otel-collector.trace.svc.cluster.local:4317
- otel-collector-1.otel-collector.trace.svc.cluster.local:4317
- otel-collector-2.otel-collector.trace.svc.cluster.local:4317
- otel-collector-3.otel-collector.trace.svc.cluster.local:4317
- otel-collector-4.otel-collector.trace.svc.cluster.local:4317
processors:
memory_limiter:
check_interval: 1s
limit_percentage: 90
spike_limit_percentage: 80
extensions:
zpages:
service:
extensions: [zpages]
pipelines:
traces:
receivers: [jaeger, zipkin]
processors: [memory_limiter]
exporters: [loadbalancing]
otel collector config
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
tail_sampling:
decision_wait: 30s
num_traces: 200000
policies:
[
# some policies
]
batch:
exporters:
alibabacloud_logservice/sls-traces:
service:
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling, batch]
exporters: [alibabacloud_logservice/sls-traces]
OK great, looks like everything needed to reproduce. @jpkrohling would you like to please take a look?
Are you experiencing this only on the latest version of the collector? If this isn't a regression, I wouldn't block the release because of this.
I have tried other versions as well, such as 0.63.0, and the issue persists. Is this normal behavior?
I need to dig into this issue, but I would expect the number of received spans to equal the number of spans exported if sampling isn't being performed and if data is being sent to only one exporter at a time.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Component(s)
exporter/loadbalancing
Describe the issue you're reporting
I am currently experiencing a discrepancy between the amount of traffic being sent by otel-agent and the amount being received by otel-collector. For some reason, the receiving traffic in otel-collector is significantly lower than the sending traffic in otel-agent.
I have considered a few possible reasons for this, including sampling rates, configuration issues, and compression algorithms. However, I am having trouble pinpointing the exact cause of the problem.
Could someone please provide some guidance on how to troubleshoot this issue? Additionally, could you please let me know if there are any other factors that could be contributing to the discrepancy between the sending and receiving traffic?
otel-agent uses loadbalancingexporter and otel-collector uses otlp receiver
Any help would be greatly appreciated. Thanks!