jfindley / newrelic_exporter

NewRelic exporter for prometheus
BSD 2-Clause "Simplified" License
29 stars 20 forks source link

time.Now() too soon? #5

Open matschaffer opened 8 years ago

matschaffer commented 8 years ago

I've been testing this out as a way to get per-route successes/latency/errors but the data I'm seeing for newrelic_count is pretty choppy.

screen shot 2016-04-29 at 11 57 28 am

Each scrape seems to take about 30s, so it could be that a 60s interval is too short, but I'm also wondering if using time.Now() might as the api.to value might be too soon and the counter hasn't been fully populated for the minute in question.

Any thoughts on what might be the best way to get metrics closer to what we'd see in the newrelic UI?

Thanks in advance!

jfindley commented 8 years ago

Hi, That's certainly possible - our metrics take long enough to return that we sample them over a 5 minute interval and don't see this. The API should support the time windowing method we're using, but as NR metrics can never be more granular than 60s anyway, I don't see an issue with aligning the request to the minute boundary. It'll be a few days before I have a chance to work on this, but I'll build you a branch to test with as soon as I can, to see if it fixes this issue.

matschaffer commented 8 years ago

Gotcha. Do you have a recommended set of api.period, scrape_interval, and scrape_timeout settings that seem to be working for you?

jfindley commented 8 years ago

Hi,

Apologies this has taken a while - turns out it was a way simpler change than I'd expected! I've pushed a change that looks right to https://github.com/jfindley/newrelic_exporter/tree/time-window is there any chance you could let me know if this works for you?

For the record, we're using: scrape_interval: 300s scrape_timeout: 120s

With api.period set to 300. This is purely because of the size of the dataset we're retrieving, however - I'd like to get 60s intervals/periods working properly if I can.

matschaffer commented 8 years ago

So far I'm still seeing this behavior with the change in place. Though now that I have EC2 instances spun up I wonder if it could be some side effect of my test environment.

The attached image was from my mac running the prom/prometheus docker image and a local build of newrelic_exporter. Being located in Japan, the network path to newrelic is probably a lot noisier than coming from EC2.

I'll keep an eye out for it on the EC2 instance, but may just punt on this if the issue never surfaces.

matschaffer commented 8 years ago

So "good" news, in a manner of speaking. The EC2 instance started showing drops, and at a noticeably higher rate than my locally running copy with this patch. Interval = 60s and timeout = 59s for both.

Laptop:

prometheus_time_series_collection_and_processing_server

EC2 instance:

prometheus_time_series_collection_and_processing_server

matschaffer commented 8 years ago

I'll replace the EC2 version with a build from this branch and we can see how that pans out.

matschaffer commented 8 years ago

Sadly still pretty bumpy on EC2 with this patch as well.

prometheus_time_series_collection_and_processing_server

jfindley commented 8 years ago

Are there any scrape errors?

matschaffer commented 8 years ago

No, log file looks clean:

time="2016-05-04T20:40:46-07:00" level=info msg="Listening on :9126." file="newrelic_exporter.go" line=518
time="2016-05-05T23:33:43-07:00" level=info msg="Listening on :9126." file="newrelic_exporter.go" line=521

This is just running it under supervisor like:

command=/opt/exporter/bin/newrelic_exporter -api.key ...
stdout_logfile=/opt/exporter/log/newrelic_exporter_supervisord.log