Closed Dieterbe closed 9 years ago
i commited a change for issue #1, which should also resolve this. please let me know if you are still having issues after updating your graphite-api container with the latest graphite-kairosdb.py script.
if you just want to patch manually, rather then rebuilding the docker image. then in the container run:
pip install --upgrade git+https://github.com/raintank/graphite-kairosdb.git
supervisorctl restart all
yes, still seeing this. note i'm requesting a timeframe of 150s apart, which is 2.5 periods. if i request a "clean" timeframe of 120s everything works nice. but with this approach, i still see the occasional null, see last request
dieter@dieter-m6800 ~ cat ./test-graphite-points2.sh
#!/bin/bash
while true; do
echo "date right now: $(date) aka $(date +%s)"
now=$(date +%s)
until=$((now-30))
from=$((until-150))
url="http://localhost:32778/render/?format=json&from=$from&target=sum%28dieter_plaetinck_be.%2A.network.http.ok_state%29&until=$until"
echo $from "-->" $until
curl -s "$url" -H 'X-Org-Id: 1' | python -mjson.tool
sleep 1
done
dieter@dieter-m6800 ~ ./test-graphite-points2.sh
date right now: Fri May 22 11:02:26 EDT 2015 aka 1432306946
1432306766 --> 1432306916
[
{
"datapoints": [
[
1,
1432306801
],
[
1,
1432306861
]
],
"target": "sumSeries(dieter_plaetinck_be.*.network.http.ok_state)"
}
]
date right now: Fri May 22 11:02:27 EDT 2015 aka 1432306947
1432306767 --> 1432306917
[
{
"datapoints": [
[
1,
1432306801
],
[
1,
1432306861
]
],
"target": "sumSeries(dieter_plaetinck_be.*.network.http.ok_state)"
}
]
date right now: Fri May 22 11:02:28 EDT 2015 aka 1432306948
1432306768 --> 1432306918
[
{
"datapoints": [
[
1,
1432306801
],
[
1,
1432306861
]
],
"target": "sumSeries(dieter_plaetinck_be.*.network.http.ok_state)"
}
]
date right now: Fri May 22 11:02:29 EDT 2015 aka 1432306949
1432306769 --> 1432306919
[
{
"datapoints": [
[
1,
1432306801
],
[
1,
1432306861
]
],
"target": "sumSeries(dieter_plaetinck_be.*.network.http.ok_state)"
}
]
date right now: Fri May 22 11:02:30 EDT 2015 aka 1432306950
1432306770 --> 1432306920
[
{
"datapoints": [
[
1,
1432306801
],
[
1,
1432306861
]
],
"target": "sumSeries(dieter_plaetinck_be.*.network.http.ok_state)"
}
]
date right now: Fri May 22 11:02:31 EDT 2015 aka 1432306951
1432306771 --> 1432306921
[
{
"datapoints": [
[
1,
1432306801
],
[
1,
1432306861
],
[
null,
1432306921
]
],
"target": "sumSeries(dieter_plaetinck_be.*.network.http.ok_state)"
}
]
Ok. So I think the issue is that kairos treats time opposite to graphite. For graphite start < ts <= end. for kairos. Start <= ts < end
I did increment the start time by 1 second but looks I also need to increment the end time by 1 aswell.
this is incredible. and the exact opposite of #2 in fact, i only found out #2 by trying to reproduce this one in a script.
anyway, in the alerting branch, we given a certain number of seconds margin to let the data flow in. this is currently configured to 30 seconds. so when creating alert check jobs, we create a job that queries a range that ends at least 30s ago, and starts x steps before the end.
see https://github.com/raintank/grafana/blob/litmus-alerting-v3/pkg/alerting/scheduler.go#L72-78 for more details.
however I noticed that very often, the last point in the returned range is null. sometimes it isn't, and it's also irrespective of the safety margin. i've tried 40, 50, 60, 100, 120, all with the same result: the last value is very often null.
here's some proof/illustration:
shortly after 18:34:15 I got these log messages from grafana: (look at the last line first, it describes the job just executed, and the first part describes the result that was returned by graphite)
So, we got this job, generated at
2015-05-20 22:34:15.00006052 +0000 UTC
It queries for a range that ends at2015-05-20 22:33:45 +0000 UTC
, (i.e. 30 seconds ago) All points have a value, except for the last onei also used wireshark to make double sure, which basically confirms: