Open iuyiuy opened 5 years ago
the end value is NOT part of the interval ... but the data is stored together with the end timestamp ...
I have not looked at the code yet, but have you checked the graphical representation, ist this correct ?
I just tested with 1.4.8 and 1.6.0. They both produce the same output for the same rrd file. Of course here i can visibly see that the dump command shows the value at the end time of what the graph shows. But xport shows the value at the end time for 1.4.8 and the beginning of the periods in 1.5.0+.
The examples i gave were really bad for showing it visually, but i've graphed them like this, to be able to see it somewhat:
kbj@kbj-dev2:~/git$ rrdtool graph tmp.png --height 1200 -a PNG --start 1540207500 --end 1540208700 'DEF:ns10=/tmp/ns10.rrd:ns10:AVERAGE' 'LINE1:ns10#ff0000:test'
Which gives this graph:
Here the 9.37[..] value is of course converted to the real value of 0.937. The value before is 0.015 and after is 185. The dump i gave earlier has a different value for before the 0.937 value, because it's actually one of the averages in the dump because of how i grep'ed it.
so it really is only an off by one in xports way of interpreting the data ... since rrdtool works with arrays in may places where only the start time is known and not a time stamp for each data value, these kinds of errors can creep in easily.
Yeah, that sounds about right.
Ok, so I've done a little digging, experimentation and git bisect
ion, and it turns out that the current behavior looks like it was done on purpose. Further testing leads me to guess that this was done to make rrdtool xport
output identical to rrdtool graph --imageformat XML
output introduced since 1.4.8.
And it looks like the current behaviour is the correct behavior, but it would be nice to understand how it is supposed to work. Is a little documentation missing?
If I revert the following three commits (here is a branch with the reverts added), I get exactly the values I got with 1.4.8 (and 1.4.9):
commit c7f7d90b67224734c15252a013eb2b1d24d93e58
Author: Tobias Oetiker <tobi@oetiker.ch>
Date: Mon Oct 21 15:48:54 2013 +0200
xport should only return as much data as was requested
commit 69fc6bd45015fb7fffc0766b6badcd7dfd0d90d5
Author: Tobias Oetiker <tobi@oetiker.ch>
Date: Tue Nov 18 10:37:50 2014 +0100
port 741d4e9 to master -- fix start and end times for xport
commit 4d474d7bd0e14a28f0017d38e9e754ee688a32ce
Author: Tobias Oetiker <tobi@oetiker.ch>
Date: Wed Mar 25 16:18:08 2015 +0100
exporting one row too many ...
However, since 1.4.9, rrdtool graph
can now also speak --imgformat XML
, and on master (without reverting the above commits), these two now have exactly the same output:
rrdtool xport -t --start 1500000000 --end 1500000030 DEF:example=example.rrd:example:AVERAGE XPORT:example > xport-out.xml
rrdtool graph graph-out.xml --imgformat XML --start 1500000000 --end 1500000030 DEF:example=example.rrd:example:AVERAGE LINE1:example
(if one ignores the <gprints>...</gprints>
tag that rrdtool graph
also produces).
So now I'd like to ask: Is the current behavior really buggy? Or are the three commits above actually correct and we rrdtool
users should adjust our application code logic somehow?
For me, the underlying problem in both 1.4.9 and master is that I don't know what:
<step>10</step>
...
<row><t>1500000000</t><v>1.0000000000e+01</v></row>
actually means. Does it mean:
In the interval from 1500000000 to 1500000000 + $step ( == 1500000010), the value was 10 - as provided to
rrdtool update
at the end of the interval. (Perhaps modified a little if the updates weren't exactly on step boundaries)
? Because then the current output (see below) does sorta make sense, and the 1.4.8 output was buggy, not the other way round.
Or should we interpret the XML as meaning something else? If so, what?
The rrdfetch
(and rrdgraph
and rrdxport
) manpages describe the formats and possible values for --start
and --end
wonderfully, but don't really explain what output to expect if I say --start 1500000001 --end 1500000023
if I have data in a 10s RRA, and exactly how to interpret the output. Does such a description exist in the documentation that I didn't find?
Without a clear understanding of what the inputs and output mean, it is difficult to determine which is the buggy output, and currently I don't have that clarity. Since the three commits above have gone back and forth on this issue three times, I am guessing this actually is a little unspecified and that this is the root cause of the confusion (at least on my part :-).
I've created a little script (below) that updates 0 at 1500000000, 10 at 1500000010 etc making it easy to correlate time and value on the console for the human eye:
> ./example.sh
Update: 1500000000@0 - which is Fri Jul 14 04:40:00 CEST 2017
Update: 1500000010@10 - which is Fri Jul 14 04:40:10 CEST 2017
Update: 1500000020@20 - which is Fri Jul 14 04:40:20 CEST 2017
Update: 1500000030@30 - which is Fri Jul 14 04:40:30 CEST 2017
Update: 1500000040@40 - which is Fri Jul 14 04:40:40 CEST 2017
Given such an RRD file, here is rrdtool graph graph.png --start 1500000000 --end 1500000030 DEF:example=example.rrd:example:AVERAGE LINE1:example#FF0000
For comparison, here is the rrdtool xport -t --start 1500000000 --end 1500000030 DEF:example=example.rrd:example:AVERAGE XPORT:example
output:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xport>
<meta>
<start>1500000000</start>
<end>1500000030</end>
<step>10</step>
<rows>3</rows>
<columns>1</columns>
<legend>
<entry></entry>
</legend>
</meta>
<data>
<row><t>1500000000</t><v>1.0000000000e+01</v></row>
<row><t>1500000010</t><v>2.0000000000e+01</v></row>
<row><t>1500000020</t><v>3.0000000000e+01</v></row>
</data>
</xport>
This sort of looks correct given the interpretation i described above:
In the interval from 1500000000 to 1500000000 + $step ( == 1500000010), the value was 10 - as provided to
rrdtool update
at the end of the interval. (Perhaps modified a little if the updates weren't exactly on step boundaries)
In contrast, given the exact same RRD file, here is the 1.4.9 rrdtool xport --start 1500000000 --end 1500000030 DEF:example=example.rrd:example:AVERAGE XPORT:example
output for the exact same RRD file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xport>
<meta>
<start>1500000010</start>
<step>10</step>
<end>1500000010</end>
<rows>4</rows>
<columns>1</columns>
<legend>
<entry></entry>
</legend>
</meta>
<data>
<row><t>1500000010</t><v>1.0000000000e+01</v></row>
<row><t>1500000020</t><v>2.0000000000e+01</v></row>
<row><t>1500000030</t><v>3.0000000000e+01</v></row>
<row><t>1500000040</t><v>4.0000000000e+01</v></row>
</data>
</xport>
I'm struggling a little to explain the 1.4.9 output...
--start
behavior also different in 1.4.9 vs. 1.5.0+My testing does show that 1.4.9 and master also behave a little differently when given off-by-one timestamps.
In 1.4.9, --start 1499999999 gives a first sample with timestamp 1500000000, but --start 1500000000 and 1500000001 both give a first sample at 1500000010.
In 1.5.0+, --start 1499999999 and 1500000000 both give a first sample with timestamp 1500000000, but 1500000001 gives first sample at 1500000010.
Again here, on the surface, it looks like the current output makes more sense than the 1.4.8 and 1.4.9 output.
But again, it is almost impossible for me as a non-expert user to be certain, without my expectations being grounded in documentation.
Which of (1.4.9, 1.5.0+) is the correct behavior and which is the bug?
If the current behavior is correct:
#!/bin/bash
# Choosing a time that is easy on the eyes on the terminal
# Fri Jul 14 04:40:00 CEST 2017
start=1500000000
rrdtool create example.rrd --start $start-60 --step 10s DS:example:GAUGE:100:0:100000000 RRA:AVERAGE:0.5:1:100
for((i=0; $i<50; i=$i+10)) {
updatetime=$(( $start + $i))
update=$updatetime@$i
echo Update: $update - which is $(date -d @$updatetime)
rrdtool update example.rrd -t example $update
}
Tobias Oetiker
On 2 Jan 2019, at 01:42, Peter Valdemar Mørch notifications@github.com wrote:
TL;DR
Ok, so I've done a little digging, experimentation and git bisection, and it turns out that the current behavior looks like it was done on purpose. Further testing leads me to guess that this was done to make rrdtool xport output identical to rrdtool graph --imageformat XML output introduced since 1.4.8. And it looks like the current behaviour is the correct behavior, but it would be nice to understand how it is supposed to work.
Is a little documentation missing?
The timestamps associated with a value, in rrdtool ALWAYS represent the time the sample was taken. Since any value you sample will represent some sort of past state your sampling apparatus has gathered, the timestamp will always be at the end of some sort of sampling period.
Rrdtool does not store the actual samples, but does internal resampling of the values presented to it. Nevertheless when a data value is presented with a single timestamp the timestamp is at the end of the period the value represents. Note that the timestamp itself is outside the period the sample is valid for.
I will look at the commits you quoted when I get to a real computer l8r today.
You write:
Nevertheless when a data value is presented with a single timestamp the timestamp is at the end of the period the value represents.
If that is true, then:
<row><t>1500000010</t><v>$value</v></row>
should represent the interval [1500000000;1500000010[
and then the $value should be 10 (as in 1.4.9) and not 20 (as in 1.5.0+).
And then there is a bug in both rrdxport
and rrdgraph
(with respect to XML/JSON output).
yes
the difference caused by your reverts is this:
I think relevant of the one affecting the timestamp output of xport by adding +step. which looks like something producing the right result for the wrong reason
So a patch that adds step to the timestamps in <row>
turning:
<row><t>1500000010</t><v>$value</v></row>
into
<row><t>1500000020</t><v>$value</v></row>
is what is needed?
Along with a similar patch for the non-graphical outputs / relevant --imgformat
s of rrdtool graph
.
I would like to also request a clarification to the documentation. Here is a draft/suggestion:
All the statistics in the output will use the same step. The first sample will be the first sample start at or immediately after
--start
. The last sample will be the one ending at or immediately after--end
.Each sample has a timestamp and a value. The timestamps associated with a value in rrdtool ALWAYS represent the time the sample was taken. Since any value you sample will represent some sort of past state your sampling apparatus has gathered, the timestamp will always be at the end of the sampling period.
Rrdtool does not store the actual samples, but does internal resampling of the values presented to it. Nevertheless when a data value is presented with a single timestamp the timestamp is at the end of the period the value represents. Note that the timestamp itself is outside the period the sample is valid for.
So the time range for a sample with a timestamp is actually
[ timestamp - step ; timestamp [
.
Where would be a good place to put this? It applies to both rrdtool xport
and the non-graphical outputs / relevant --imgformat
s of rrdtool graph
.
as to where to add additional information into the documentation ... maybe also https://github.com/oetiker/rrdtool-1.x/blob/master/doc/PDP_calculation_explanation.pod would be a good place ... feel free to create a PR with your additions
please test
Thanks Tobi,
I've tested it. It looks good. Here is the diff from running rrdtool xport -t --start 1500000000 --end 1500000030 DEF:example=example.rrd:example:AVERAGE XPORT:example
before and after the fixes in #928:
» diff -u old new
--- old 2019-01-03 14:51:42.458661722 +0100
+++ new 2019-01-03 14:51:36.490701715 +0100
@@ -2,7 +2,7 @@
<xport>
<meta>
- <start>1500000000</start>
+ <start>1500000010</start>
<end>1500000030</end>
<step>10</step>
<rows>3</rows>
@@ -12,8 +12,8 @@
</legend>
</meta>
<data>
- <row><t>1500000000</t><v>1.0000000000e+01</v></row>
- <row><t>1500000010</t><v>2.0000000000e+01</v></row>
- <row><t>1500000020</t><v>3.0000000000e+01</v></row>
+ <row><t>1500000010</t><v>1.0000000000e+01</v></row>
+ <row><t>1500000020</t><v>2.0000000000e+01</v></row>
+ <row><t>1500000030</t><v>3.0000000000e+01</v></row>
</data>
</xport>
One thing that I wasn't expecting is that the <start>
value also changed. pre-#928, <start>
was the beginning of the first interval and <end>
was the end of the last interval. post-#928, <start>
and <end>
are the ends of the first and last intervals respectively.
A little weird that --start
and <start>
don't mean the same thing.
But that is fine as long as it is documented. And I'll include that detail in my documentation PR that is coming.
yea :) I think having time pointer salways fllowing the same logic at least in the output makes sense ... looking forward to the PR.
there is now an automated test for this btw!
The provided fix of PR #928, does not solve all inconsistencies regarding output of the export function, and it fails to provide enough data to reproduce the graphs. I attach a little script to assist debug this issue. It's based on the work of @pmorch
#!/usr/bin/env bash
# Choosing a time that is easy on the eyes on the terminal
# Fri Jul 14 04:40:00 CEST 2017
start=1500000000
RRD=rrdtool
RRD=~/git/rrdtool-1.x/src/rrdtool
function createrrd()
{
echo "RRD CREATE"
$RRD create example.rrd --start $start-60 --step 10s DS:example:GAUGE:100:0:100000000 DS:second:GAUGE:100:0:100000000 RRA:MAX:0.5:1:15
for((i=0; $i<150; i=$i+10)) {
updatetime=$(( $start + $i))
update=$updatetime@$i:$(( $i +1 ))
echo Update: $update - which is $(date -d @$updatetime)
$RRD update example.rrd -t example:second $update
}
}
function gather_dataplot()
{
#ARGS: step start_shift duration_shift
qstart=$(($start + $2))
end=$(($start + $3))
echo "Query start: " $qstart ":" $(date -d@$qstart)
echo "Query end : "$end ":" $(date -d@$end)
echo "RRD XPORT"
$RRD xport \
DEF:fir=example.rrd:example:MAX XPORT:fir \
DEF:sec=example.rrd:second:MAX XPORT:sec \
-s $qstart -e $end --json -t --step $1
echo "RRD Graph"
$RRD graph tmp$1.png --height 300 -a PNG --start $qstart --end $end --step $1 \
DEF:fir=example.rrd:example:MAX 'LINE1:fir#ff0000:first' \
DEF:sec=example.rrd:second:MAX 'LINE1:sec#00ff00:second'
echo "RRD CSV"
$RRD graph tmp$1.csv --height 300 -a CSV --start $qstart --end $end --step $1 \
DEF:fir=example.rrd:example:MAX 'LINE1:fir#ff0000:first' \
DEF:sec=example.rrd:second:MAX 'LINE1:sec#00ff00:second'
cat tmp$1.csv
}
createrrd
echo -e "\nDATABASE NATIVE RESOLUTION"
gather_dataplot 1 5 72
echo -e "\nDATABASE RESAMPLED RESOLUTION"
gather_dataplot 30 5 72
echo -e "\nDATABASE QUERY WITHIN MEASUREMENTS"
gather_dataplot 2 2 9
montage tmp1.png tmp30.png tmp2.png -geometry +10+10 result.png
Evaluating it on the code provided from master. It produces the following output that I'll break in some pieces.
RRD CREATE
Update: 1500000000@0:1 - which is Fri Jul 14 04:40:00 CEST 2017
Update: 1500000010@10:11 - which is Fri Jul 14 04:40:10 CEST 2017
Update: 1500000020@20:21 - which is Fri Jul 14 04:40:20 CEST 2017
Update: 1500000030@30:31 - which is Fri Jul 14 04:40:30 CEST 2017
Update: 1500000040@40:41 - which is Fri Jul 14 04:40:40 CEST 2017
Update: 1500000050@50:51 - which is Fri Jul 14 04:40:50 CEST 2017
Update: 1500000060@60:61 - which is Fri Jul 14 04:41:00 CEST 2017
Update: 1500000070@70:71 - which is Fri Jul 14 04:41:10 CEST 2017
Update: 1500000080@80:81 - which is Fri Jul 14 04:41:20 CEST 2017
Update: 1500000090@90:91 - which is Fri Jul 14 04:41:30 CEST 2017
Update: 1500000100@100:101 - which is Fri Jul 14 04:41:40 CEST 2017
Update: 1500000110@110:111 - which is Fri Jul 14 04:41:50 CEST 2017
Update: 1500000120@120:121 - which is Fri Jul 14 04:42:00 CEST 2017
Update: 1500000130@130:131 - which is Fri Jul 14 04:42:10 CEST 2017
Update: 1500000140@140:141 - which is Fri Jul 14 04:42:20 CEST 2017
First create a two column database, where each value matches the timestamp for easier recognition, second column is added a one.
When doing a query:
DATABASE NATIVE RESOLUTION
Query start: 1500000005 : Fri Jul 14 04:40:05 CEST 2017
Query end : 1500000072 : Fri Jul 14 04:41:12 CEST 2017
RRD XPORT
{ "about": "RRDtool graph JSON output",
"meta": {
"start": 1500000020,
"end": 1500000080,
"step": 10,
"legend": [
"",
""
]
},
"data": [
[ "1500000020",2.0000000000e+01, 2.1000000000e+01 ],
[ "1500000030",3.0000000000e+01, 3.1000000000e+01 ],
[ "1500000040",4.0000000000e+01, 4.1000000000e+01 ],
[ "1500000050",5.0000000000e+01, 5.1000000000e+01 ],
[ "1500000060",6.0000000000e+01, 6.1000000000e+01 ],
[ "1500000070",7.0000000000e+01, 7.1000000000e+01 ],
[ "1500000080",8.0000000000e+01, 8.1000000000e+01 ]
]
}
RRD CSV
"time","first","second"
1500000020,"2.0000000000e+01","2.1000000000e+01"
1500000030,"3.0000000000e+01","3.1000000000e+01"
1500000040,"4.0000000000e+01","4.1000000000e+01"
1500000050,"5.0000000000e+01","5.1000000000e+01"
1500000060,"6.0000000000e+01","6.1000000000e+01"
1500000070,"7.0000000000e+01","7.1000000000e+01"
The export tool gives a start timestamp 15 secs later than the query, that is larger than my resolution. If every timestamp represents the end of a measuring interval start should be on the 10 sec since that one would be valid from [0;10[. The end boundary behaves well. But on the CSV export that end point is missing. But as you can see from the produced graph, which is indeed represented fro the query interval [5;72], I don't receive enough information from the xport output to be able to reproduce this plot. I would need 8 values, json/xml export returns 7, and the SV exporter only 6.
If I do some re-sampling, and increase the step size, the graph still behaves correctly by from the xport output it is not possible to reconstruct the graph.
DATABASE RESAMPLED RESOLUTION
Query start: 1500000005 : Fri Jul 14 04:40:05 CEST 2017
Query end : 1500000072 : Fri Jul 14 04:41:12 CEST 2017
RRD XPORT
{ "about": "RRDtool graph JSON output",
"meta": {
"start": 1500000060,
"end": 1500000090,
"step": 30,
"legend": [
"",
""
]
},
"data": [
[ "1500000060",6.0000000000e+01, 6.1000000000e+01 ],
[ "1500000090",null, null ]
]
}
RRD CSV
0x0
"time","first","second"
1500000060,"6.0000000000e+01","6.1000000000e+01"
Finally when doing a query within the two timestamps. The graph tool is still able to plot a representative value for the queried interval, but the export tools deliver an empty output. Even worse the XML/JSON exporter gives in metadata a start timestamp larger than the end timestamp.
DATABASE QUERY WITHIN MEASUREMENTS
Query start: 1500000002 : Fri Jul 14 04:40:02 CEST 2017
Query end : 1500000009 : Fri Jul 14 04:40:09 CEST 2017
RRD XPORT
{ "about": "RRDtool graph JSON output",
"meta": {
"start": 1500000020,
"end": 1500000010,
"step": 10,
"legend": [
"",
""
]
},
"data": [
]
}
RRD CSV
0x0
"time","first","second"
There seems to be an inconsistency with which value is displayed at what time between the xport and dump commands. The times and values match in 1.4.8, but not in 1.5.0, 1.6.0 or the newest 1.7.0 from the repository.
Here are some examples using the same rrd file for 3 of the versions. Note: Newer versions need the -t/--showtime argument to show the time. And i've censored the e-mail, for the sake of online crawlers for spam.
1.4.8:
1.6.0:
1.7.0 (newest from git):
As you can see, the 9.37[..] value is set to the [..]8100 time in the dump output on all 3 examples, but it's only on the same time for xport in the 1.4.8 example. The newer versions has that value set to the [..]7800 time when using xport.
In #805 it's indicated that the timestamp is supposed to be the end of the time period, so i would think that it might have been accidentally starting to show the timestamp for the beginning of the time period. I would assume it happened in the 1.5.0 rewrite of xport that was mentioned in #672.
I'm also unclear as to whether the start and end values are supposed to be both inclusive or exclusive and whether they should match the end time for each period or what. If this is going to be revisited, i'd ask that this would also be considered, so it'd be consistent or at least on purpose, so it hopefully will stay that way. In the newer versions, it seems to include the timestamps that match the start value, but not timestamps that match the end value.