Open matschaffer opened 8 years ago
Hi, That's certainly possible - our metrics take long enough to return that we sample them over a 5 minute interval and don't see this. The API should support the time windowing method we're using, but as NR metrics can never be more granular than 60s anyway, I don't see an issue with aligning the request to the minute boundary. It'll be a few days before I have a chance to work on this, but I'll build you a branch to test with as soon as I can, to see if it fixes this issue.
Gotcha. Do you have a recommended set of api.period
, scrape_interval
, and scrape_timeout
settings that seem to be working for you?
Hi,
Apologies this has taken a while - turns out it was a way simpler change than I'd expected! I've pushed a change that looks right to https://github.com/jfindley/newrelic_exporter/tree/time-window is there any chance you could let me know if this works for you?
For the record, we're using: scrape_interval: 300s scrape_timeout: 120s
With api.period set to 300. This is purely because of the size of the dataset we're retrieving, however - I'd like to get 60s intervals/periods working properly if I can.
So far I'm still seeing this behavior with the change in place. Though now that I have EC2 instances spun up I wonder if it could be some side effect of my test environment.
The attached image was from my mac running the prom/prometheus docker image and a local build of newrelic_exporter. Being located in Japan, the network path to newrelic is probably a lot noisier than coming from EC2.
I'll keep an eye out for it on the EC2 instance, but may just punt on this if the issue never surfaces.
So "good" news, in a manner of speaking. The EC2 instance started showing drops, and at a noticeably higher rate than my locally running copy with this patch. Interval = 60s and timeout = 59s for both.
Laptop:
EC2 instance:
I'll replace the EC2 version with a build from this branch and we can see how that pans out.
Sadly still pretty bumpy on EC2 with this patch as well.
Are there any scrape errors?
No, log file looks clean:
time="2016-05-04T20:40:46-07:00" level=info msg="Listening on :9126." file="newrelic_exporter.go" line=518
time="2016-05-05T23:33:43-07:00" level=info msg="Listening on :9126." file="newrelic_exporter.go" line=521
This is just running it under supervisor like:
command=/opt/exporter/bin/newrelic_exporter -api.key ...
stdout_logfile=/opt/exporter/log/newrelic_exporter_supervisord.log
I've been testing this out as a way to get per-route successes/latency/errors but the data I'm seeing for newrelic_count is pretty choppy.
Each scrape seems to take about 30s, so it could be that a 60s interval is too short, but I'm also wondering if using
time.Now()
might as theapi.to
value might be too soon and the counter hasn't been fully populated for the minute in question.Any thoughts on what might be the best way to get metrics closer to what we'd see in the newrelic UI?
Thanks in advance!