raintank / graphite-kairosdb

Graphite-Api finder plugin for Kairosdb
Apache License 2.0
10 stars 4 forks source link

newest point in series null #3

Closed Dieterbe closed 9 years ago

Dieterbe commented 9 years ago

this is incredible. and the exact opposite of #2 in fact, i only found out #2 by trying to reproduce this one in a script.

anyway, in the alerting branch, we given a certain number of seconds margin to let the data flow in. this is currently configured to 30 seconds. so when creating alert check jobs, we create a job that queries a range that ends at least 30s ago, and starts x steps before the end.

see https://github.com/raintank/grafana/blob/litmus-alerting-v3/pkg/alerting/scheduler.go#L72-78 for more details.

however I noticed that very often, the last point in the returned range is null. sometimes it isn't, and it's also irrespective of the safety margin. i've tried 40, 50, 60, 100, 120, all with the same result: the last value is very often null.

here's some proof/illustration:

shortly after 18:34:15 I got these log messages from grafana: (look at the last line first, it describes the job just executed, and the first part describes the result that was returned by graphite)

&{2015-05-20 22:33:15 +0000 UTC 2015-05-20 22:33:45 +0000 UTC [sum(dieter_plaetinck_be.*.network.ping.error_state)] <nil>} GRAPHITE START
(graphite.Response) (len=1 cap=4) {
 (graphite.Series) {
  Datapoints: ([]graphite.DataPoint) (len=4 cap=4) {
   (graphite.DataPoint) (len=2 cap=4) {
    (json.Number) (len=1) 0,
    (json.Number) (len=10) 1432161195
   },
   (graphite.DataPoint) (len=2 cap=4) {
    (json.Number) (len=1) 0,
    (json.Number) (len=10) 1432161205
   },
   (graphite.DataPoint) (len=2 cap=4) {
    (json.Number) (len=1) 0,
    (json.Number) (len=10) 1432161215
   },
   (graphite.DataPoint) (len=2 cap=4) {
    (json.Number) ,
    (json.Number) (len=10) 1432161225
   }
  },
  Target: (string) (len=57) "sumSeries(dieter_plaetinck_be.*.network.ping.error_state)"
 }
}
job results <Job> key=alert-id_1432161255 generatedAt=2015-05-20 22:34:15.00006052 +0000 UTC lastPointTs=2015-05-20 22:33:45 +0000 UTC definition: <CheckDef> Crit: ''sum(graphite("sum(dieter_plaetinck_be.*.network.ping.error_state)", "30s", "", "") >= 1) == 3' -- Warn: '0' GraphiteContext saw 1 unknown values returned from server Unknown

So, we got this job, generated at 2015-05-20 22:34:15.00006052 +0000 UTC It queries for a range that ends at 2015-05-20 22:33:45 +0000 UTC , (i.e. 30 seconds ago) All points have a value, except for the last one

 date -d @1432161225
Wed May 20 18:33:45 EDT 2015

i also used wireshark to make double sure, which basically confirms:

GET /render/?format=json&from=1432161195&target=sum%28dieter_plaetinck_be.%2A.network.ping.error_state%29&until=1432161225 HTTP/1.1
Host: graphite-api:8888
User-Agent: Go 1.1 package http
X-Org-Id: 1
Accept-Encoding: gzip

HTTP/1.1 200 OK
Server: gunicorn/19.3.0
Date: Wed, 20 May 2015 22:34:15 GMT
Connection: close
Last-Modified: Wed, 20 May 2015 22:34:15 GMT
Expires: Wed, 20 May 2015 22:35:15 GMT
Content-Type: application/json
Cache-Control: max-age=60
Content-Length: 160

[{"target": "sumSeries(dieter_plaetinck_be.*.network.ping.error_state)", "datapoints": [[0, 1432161195], [0, 1432161205], [0, 1432161215], [null, 1432161225]]}]
woodsaj commented 9 years ago

i commited a change for issue #1, which should also resolve this. please let me know if you are still having issues after updating your graphite-api container with the latest graphite-kairosdb.py script.

if you just want to patch manually, rather then rebuilding the docker image. then in the container run:

pip install --upgrade  git+https://github.com/raintank/graphite-kairosdb.git
supervisorctl restart all
Dieterbe commented 9 years ago

yes, still seeing this. note i'm requesting a timeframe of 150s apart, which is 2.5 periods. if i request a "clean" timeframe of 120s everything works nice. but with this approach, i still see the occasional null, see last request

dieter@dieter-m6800 ~ cat ./test-graphite-points2.sh
#!/bin/bash
while true; do
    echo "date right now: $(date) aka $(date +%s)"
    now=$(date +%s)
    until=$((now-30))
    from=$((until-150))
    url="http://localhost:32778/render/?format=json&from=$from&target=sum%28dieter_plaetinck_be.%2A.network.http.ok_state%29&until=$until"
    echo $from "-->" $until
    curl -s "$url" -H 'X-Org-Id: 1' | python -mjson.tool
    sleep 1
done
dieter@dieter-m6800 ~ ./test-graphite-points2.sh
date right now: Fri May 22 11:02:26 EDT 2015 aka 1432306946
1432306766 --> 1432306916
[
    {
        "datapoints": [
            [
                1,
                1432306801
            ],
            [
                1,
                1432306861
            ]
        ],
        "target": "sumSeries(dieter_plaetinck_be.*.network.http.ok_state)"
    }
]
date right now: Fri May 22 11:02:27 EDT 2015 aka 1432306947
1432306767 --> 1432306917
[
    {
        "datapoints": [
            [
                1,
                1432306801
            ],
            [
                1,
                1432306861
            ]
        ],
        "target": "sumSeries(dieter_plaetinck_be.*.network.http.ok_state)"
    }
]
date right now: Fri May 22 11:02:28 EDT 2015 aka 1432306948
1432306768 --> 1432306918
[
    {
        "datapoints": [
            [
                1,
                1432306801
            ],
            [
                1,
                1432306861
            ]
        ],
        "target": "sumSeries(dieter_plaetinck_be.*.network.http.ok_state)"
    }
]
date right now: Fri May 22 11:02:29 EDT 2015 aka 1432306949
1432306769 --> 1432306919
[
    {
        "datapoints": [
            [
                1,
                1432306801
            ],
            [
                1,
                1432306861
            ]
        ],
        "target": "sumSeries(dieter_plaetinck_be.*.network.http.ok_state)"
    }
]
date right now: Fri May 22 11:02:30 EDT 2015 aka 1432306950
1432306770 --> 1432306920
[
    {
        "datapoints": [
            [
                1,
                1432306801
            ],
            [
                1,
                1432306861
            ]
        ],
        "target": "sumSeries(dieter_plaetinck_be.*.network.http.ok_state)"
    }
]
date right now: Fri May 22 11:02:31 EDT 2015 aka 1432306951
1432306771 --> 1432306921
[
    {
        "datapoints": [
            [
                1,
                1432306801
            ],
            [
                1,
                1432306861
            ],
            [
                null,
                1432306921
            ]
        ],
        "target": "sumSeries(dieter_plaetinck_be.*.network.http.ok_state)"
    }
]
woodsaj commented 9 years ago

Ok. So I think the issue is that kairos treats time opposite to graphite. For graphite start < ts <= end. for kairos. Start <= ts < end

I did increment the start time by 1 second but looks I also need to increment the end time by 1 aswell.