Open frakev opened 6 months ago
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
I'm seeing something similar. I added some details over on this issue: https://github.com/open-telemetry/opentelemetry-collector/issues/8217#issuecomment-2046644595
I managed to solve (or brute force?) the issue by setting this in the exporter, the default is 5 consumers.
remote_write_queue:
num_consumers: 50
Thank you @martinohansen. I'll try, but it's just a workaround and maybe this issue needs to be fixed.
I'm having a similar error, but increasing the num_consumers didn't help.
I'm running otel collector contrib 0.98.0 with global mode - so, 1 instance per node - in Docker Swarm.
My configuration is generally working, as in 4 out of 6 instances can properly send their metrics to the endpoint.
However, two instances can't do that and it fails with:
2024-04-14T21:40:38.279Z error exporterhelper/queue_sender.go:101 Exporting failed. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite/metrics-infrastructure", "error": "Permanent error: Permanent error: context deadline exceeded", "dropped_items": 1425}
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1
go.opentelemetry.io/collector/exporter@v0.98.0/exporterhelper/queue_sender.go:101
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume
go.opentelemetry.io/collector/exporter@v0.98.0/internal/queue/bounded_memory_queue.go:57
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1
go.opentelemetry.io/collector/exporter@v0.98.0/internal/queue/consumers.go:43
No more info is available.
This is my config:
receivers:
hostmetrics:
collection_interval: 3s
root_path: /hostfs
scrapers:
cpu:
metrics:
system.cpu.logical.count:
enabled: true
system.cpu.physical.count:
enabled: true
"system.cpu.frequency":
enabled: true
"system.cpu.utilization":
enabled: true
load: { }
memory:
metrics:
"system.linux.memory.available":
enabled: true
"system.memory.limit":
enabled: true
"system.memory.utilization":
enabled: true
disk: { }
filesystem:
metrics:
"system.filesystem.utilization":
enabled: true
paging:
metrics:
"system.paging.utilization":
enabled: true
"system.paging.usage":
enabled: true
network: { }
process:
mute_process_io_error: true
mute_process_exe_error: true
mute_process_user_error: true
metrics:
"process.cpu.utilization":
enabled: true
"process.memory.utilization":
enabled: true
"process.disk.io":
enabled: true
"process.disk.operations":
enabled: true
process.threads:
enabled: true
process.paging.faults:
enabled: true
processors:
batch:
send_batch_size: 10000
send_batch_max_size: 11000
timeout: 10s
exporters:
prometheusremotewrite/metrics-infrastructure:
endpoint: http://mimir-lb:9010/api/v1/push
tls:
insecure: true
headers:
- "X-Scope-OrgID": "infrastructure"
resource_to_telemetry_conversion:
enabled: true
remote_write_queue:
enabled: true
queue_size: 100000
num_consumers: 50
service:
telemetry:
logs:
level: debug
metrics:
level: detailed
address: 0.0.0.0:8888
pipelines:
metrics/infrastructure:
receivers: [ hostmetrics]
processors: [ batch ]
exporters: [ prometheusremotewrite/metrics-infrastructure ]
What can we do to further troubleshoot this issue?
After 6 hours, I finally figured it out: An nginx config in mimir-lb that doesn't update the IP addresses of the upstream servers. One of the upstream containers restarted and got a new IP address, which wasn't reflected in nginx.
The two containers that exhibited this problem must've been routed to the restarted instance every single time.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
I'm encountering the same error. From the error messages, it is not clear to me whether writing to the remote endpoint is failing (i.e. does Permanent error: context deadline exceeded
come from that server?), or is some local endpoint not being scraped properly, i.e. timeouts?
I am getting a similar issue here. I am using the opentelemetry-collector-contrib: 0.103.1 my collector is processing 200k span/min processing on spanmetrics connector and finally sending to prometheus instance. After a couple of hours running, the collector begin to show this error message.
2024-06-26T20:06:15.292972825Z 2024-06-26T20:06:15.292Z error exporterhelper/queue_sender.go:90 Exporting failed. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded", "errorCauses": [{"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadl
ine exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}], "dropped_items": 12404}
2024-06-26T20:06:15.293013725Z go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1
2024-06-26T20:06:15.293019456Z go.opentelemetry.io/collector/exporter@v0.103.0/exporterhelper/queue_sender.go:90
2024-06-26T20:06:15.293023534Z go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume
2024-06-26T20:06:15.293026425Z go.opentelemetry.io/collector/exporter@v0.103.0/internal/queue/bounded_memory_queue.go:52
2024-06-26T20:06:15.293029184Z go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1
2024-06-26T20:06:15.293031423Z go.opentelemetry.io/collector/exporter@v0.103.0/internal/queue/consumers.go:43```
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
triage:
timeout
of the exporter. If this is a common issue, we may want to increase the default timeout, which is 5 seconds.I'm also getting this error when trying to use prometheusremotewrite... Can't figure out what the issue is. Error from prometheusremotewrite
exporter:
2024-08-28T00:11:46.851Z error exporterhelper/queue_sender.go:92 Exporting failed. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: Permanent error: context deadline exceeded", "dropped_items": 636}
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1
go.opentelemetry.io/collector/exporter@v0.106.1/exporterhelper/queue_sender.go:92
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume
go.opentelemetry.io/collector/exporter@v0.106.1/internal/queue/bounded_memory_queue.go:52
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1
go.opentelemetry.io/collector/exporter@v0.106.1/internal/queue/consumers.go:43
Config:
exporters:
prometheusremotewrite:
add_metric_suffixes: false
endpoint: http://mimir-.../api/v1/push
headers:
Authorization: Bearer my-token-here
X-Scope-OrgID: my-org-id
max_batch_size_bytes: 30000000
tls:
insecure_skip_verify: true
Also, when trying to use the otlphttp exporter I'm getting a 499.
I've tried both exporters to send into Mimir.
UPDATE
I have solved this for the prometheusremotewrite
exporter by simplifying the config:
exporters:
prometheusremotewrite:
endpoint: http://mimir-.../api/v1/push
headers:
Authorization: Bearer my-token-here
tls:
insecure_skip_verify: true
for me its worked by changing
from this: `tls:
insecure: true`
to this: `tls:
insecure_skip_verify: true`
I am having similar issue as: error exporterhelper/common.go:95 Exporting failed. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: Permanent error: context deadline exceeded", "dropped_items": 5} I am using adot-collector as my pod, in AWS. Any idea what can be done. Any help would be appreciated.
same for me..please help
Component(s)
exporter/prometheusremotewrite
What happened?
Description
If the endpoint is not reachable and OTEL can't send metrics, I get some error messages.
Steps to Reproduce
Expected Result
No error messages, but an info log to let you know that the collector is queuing the metrics due to endpoint downtime.
Actual Result
2024-03-22T14:35:06.566Z error exporterhelper/queue_sender.go:97 Exporting failed. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: Permanent error: context deadline exceeded", "dropped_items": 2353} go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1 go.opentelemetry.io/collector/exporter@v0.96.0/exporterhelper/queue_sender.go:97 go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume go.opentelemetry.io/collector/exporter@v0.96.0/internal/queue/bounded_memory_queue.go:57 go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1 go.opentelemetry.io/collector/exporter@v0.96.0/internal/queue/consumers.go:43 2024-03-22T14:35:12.008Z error exporterhelper/queue_sender.go:97 Exporting failed. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: Permanent error: context deadline exceeded", "dropped_items": 24} go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1 go.opentelemetry.io/collector/exporter@v0.96.0/exporterhelper/queue_sender.go:97 go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume go.opentelemetry.io/collector/exporter@v0.96.0/internal/queue/bounded_memory_queue.go:57 go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1 go.opentelemetry.io/collector/exporter@v0.96.0/internal/queue/consumers.go:43
Collector version
0.96.0
Environment information
Environment
Docker image: otel/opentelemetry-collector:0.96.0
OpenTelemetry Collector configuration
Log output
Additional context
No response