Open ImDevinC opened 1 year ago
+1 I am also having issues when using the WAL for the prometheusremotewrite exporter. The only way I could get it to export metrics was by setting the buffer_size to 1 and exporting 1 metric at a time is not an option
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
We have moved off of the prometheusremotewrite and it looks like there's no action on this. Closing the ticket
@ImDevinC Can you reopen this issue? It has to be investigated and fixed anyways.
Any update on this? I'm seeing the same issue. As soon as I enable WAL no metric is sent out.
This is a deadlock. From what I can see the following is happening:
The problem is when data is not found, it watches the file:
Removing the file watcher fixes the issue.
However, it exposes another bug, we keep reading the same data and resending the requests again and again. I think the WAL implementation needs a closer look.
I am working on set up as follows, has the same issue where in Opentelemetry metrics are not forwaded to Grafana via Victoriametrics when WAL configuration is enabled in Opentelemetry collector configuration. however when wal is disabled we can see metrics are seen on grafana dashboard.
flow: App-->Otel Agen--> VictoriaMetrics--> grafana use case: I want to implement persistence of metrics in the event of any failures. ex: in this flow if VM insert/VM is down and comes back online after some downtime Otel agent to should retry the failed metrics and post on VM and same should be seen grafana.
Please advise if any better solution available for my use case. Please someone help to fix the issue.
I can confirm the same. To be able to test it faster, I moved the relevant parts into a config file that works locally.
Then I used telemetrygen
to generate some data. The collector starts to hang and needs to be force killed.
telemetrygen metrics --otlp-insecure --duration 45s --rate 500
But using this patch https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/20875 from @sh0rez I start to receive metrics:
# HELP rwrecv_requests_total
# TYPE rwrecv_requests_total counter
rwrecv_requests_total{code="200",method="GET",path="/metrics",remote="localhost"} 3
rwrecv_requests_total{code="200",method="POST",path="/api/v1/write",remote="localhost"} 29
# HELP rwrecv_samples_received_total
# TYPE rwrecv_samples_received_total counter
rwrecv_samples_received_total{remote="localhost"} 7514
I am working on set up as follows, has the same issue where in Opentelemetry metrics are not forwaded to Grafana via Victoriametrics when WAL configuration is enabled in Opentelemetry collector configuration. however when wal is disabled we can see metrics are seen on grafana dashboard.
flow: App-->Otel Agen--> VictoriaMetrics--> grafana use case: I want to implement persistence of metrics in the event of any failures. ex: in this flow if VM insert/VM is down and comes back online after some downtime Otel agent to should retry the failed metrics and post on VM and same should be seen grafana.
Please advise if any better solution available for my use case. Please someone help to fix the issue.
@kumar0204 I'm looking to do the same thing for OTEL to retry failed in case of backend goes down, did you find something for this like persistence or anything with remote write exporter?
@zakariais is the filestorage extension what you are looking for?
@zakariais is the filestorage extension what you are looking for?
@frzifus does the file storage extension work with prometheus remote write exporter? I didn't see that it works in the README of it.
I am working on set up as follows, has the same issue where in Opentelemetry metrics are not forwaded to Grafana via Victoriametrics when WAL configuration is enabled in Opentelemetry collector configuration. however when wal is disabled we can see metrics are seen on grafana dashboard. flow: App-->Otel Agen--> VictoriaMetrics--> grafana use case: I want to implement persistence of metrics in the event of any failures. ex: in this flow if VM insert/VM is down and comes back online after some downtime Otel agent to should retry the failed metrics and post on VM and same should be seen grafana. Please advise if any better solution available for my use case. Please someone help to fix the issue.
@kumar0204 I'm looking to do the same thing for OTEL to retry failed in case of backend goes down, did you find something for this like persistence or anything with remote write exporter?
I have 2 types of persistence used in our set up. my flow is like this Service/Application-->Otel Agent with filestorage extention used for persistence -> Otel collector /Gateway with WriteAheadLog using prometheusremotewrite for persistence --> Victoria metrics ( SRE Back end) --> Grafana
1 use case: in the above set up metrics are stored at agent end using filestorage extention, in case if Gateway is down then metrics are replayed from otel agent side. 2nd use case:in case Victoriametrics/Promethus is down metrics are stored at WAL log, once the SRE back end is up and running metrics are replayed from gateway. my second use case has issue with WAL, when WAL enabled metrics are not reaching grafana. hope you understood the issue clearly.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners for exporter/prometheusremotewrite: @Aneurysm9 @rapphil. See Adding Labels via Comments if you do not have permissions to add labels yourself.
i have a similar setup as @kumar0204 and running into the exact same issue with enabling wal on prometheusremotewrite
There is actually already a fix that has to be polished: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/20875
Do you want to work on that @cheskayang ?
@frzifus thx for letting me know! i saw you opened a pr after this comment, but it's stale, https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/29297
do you still plan to ship the fix?
@kumar0204 Even i have a similar setup. Were you able to solve the WAL issue?
Ping ... we'd really like to see this get fixed as well... :/
i've reopened and rebased https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/20875 which will fix this
prometheusremotewrite with WAL enabled just flat out doesn't work. I've never seen it work anyway. There has been a PR out there to fix for over a year it looks like. Curious what the plan is here? Merge that, get a different fix, just remove WAL, or just leave it out there indifferently not working at all?
What happened?
Description
When using the prometheusremotewrite exporter with the WAL enabled, no metrics are sent from the collector to the remote write destination.
Steps to Reproduce
Using the config in the config section below can reproduce this error by sending metrics to this collector. Disabling the WAL section causes all metrics to be sent properly.
Expected Result
Prometheus metrics should appear in the remote write destination.
Actual Result
No metrics were sent to the remote write destination.
Collector version
0.62.1
Environment information
Environment
AWS bottlerocket running otel/opentelemetry-collector-contrib:0.36.3 docker image
OpenTelemetry Collector configuration
Log output
No response
Additional context
From debugging, this looks to be a deadlock between
persistToWAL()
andreadPrompbFromWAL()
, but I'm not 100% certain