oetiker / rrdtool-1.x

RRDtool 1.x - Round Robin Database
http://www.rrdtool.org
GNU General Public License v2.0
992 stars 261 forks source link

AVERAGE chart looking vastly different between rrdtool 1.4.8 and rrdtool 1.7.2 #1249

Open toby1984 opened 5 months ago

toby1984 commented 5 months ago

Describe the bug

After upgrading our application to rrdtool 1.7.2 (we migrated from CentOS 7.9 and rrdtool 1.4.8 to Rocky Linux 9.2 which ships with rrdtool 1.7.2) we noticed that "rrdtool graph" suddenly renders some charts differently (given the same data and command line options).

I've uploaded all data related to this issue here: bug.zip

The RRA has been created using the following command (and has been updated every 60 seconds by some cron job):

rrdtool create cpuload.rrd --step 60 DS:cpuload:GAUGE:120:U:U \
    RRA:MAX:0.5:1:10080 \
    RRA:MAX:0.5:15:2688 \
    RRA:MAX:0.5:60:2016 \
    RRA:MAX:0.5:1440:730 \
    RRA:AVERAGE:0.5:5:10080 \
    RRA:AVERAGE:0.5:15:2688 \
    RRA:AVERAGE:0.5:60:2016 \
    RRA:AVERAGE:0.5:1440:730

To Reproduce

Use the cpuload.rrd attached to this ticket and issue the following two commands, both on 1.4.8 and 1.7.2:

TZ=Europe/Berlin rrdtool graph cpu_load_max.png --full-size-mode -w 1128 -h 180 --start 1707901200 --end 1707904800 -c 'BACK#FFFFFF' -c 'CANVAS#FFFFFF' -c 'SHADEA#CCCCCC' -c 'SHADEB#CCCCCC' -c 'ARROW#000000' -c 'FONT#000000' --font DEFAULT:8: --title 'localhost (127.0.0.1): Load Average' --vertical-label 'Load Average' 'DEF:load=cpuload.rrd:cpuload:MAX' 'AREA:load#1C3452:Load Average' 'GPRINT:load:AVERAGE:Avg load\: %5.2lf' 'GPRINT:load:MAX:Max load\:%5.2lf'
TZ=Europe/Berlin rrdtool graph cpu_load_avg.png --full-size-mode -w 1128 -h 180 --start 1707901200 --end 1707904800 -c 'BACK#FFFFFF' -c 'CANVAS#FFFFFF' -c 'SHADEA#CCCCCC' -c 'SHADEB#CCCCCC' -c 'ARROW#000000' -c 'FONT#000000' --font DEFAULT:8: --title 'localhost (127.0.0.1): Load Average' --vertical-label 'Load Average' 'DEF:load=cpuload.rrd:cpuload:AVERAGE' 'AREA:load#1C3452:Load Average' 'GPRINT:load:AVERAGE:Avg load\: %5.2lf' 'GPRINT:load:MAX:Max load\:%5.2lf'

Expected behavior Generated PNG files should be (near) identical on both 1.4.8 and 1.7.2 but this is only true for the MAX chart but not the AVERAGE one.

1.4.8 cpu_load_avg_1 4 8

1.7.2 cpu_load_avg_1 7 2

JKammler commented 5 months ago

It may has to do with my fix for version 1.7.2 in 2019 (Optimized PDP Calculation). Now the interval interpolation is much more accurate and is not blurring over multiple steps.

toby1984 commented 5 months ago

According to my RRA definition (see above, 1 minute step, retaining the last 7 days at that resolution) and given that I'm requesting

I would've expected the 1.4.8 chart to look just like the 1.7.2 one. Is this something you need more time looking into ? Just asking so I know how to proceed with our internal JIRA issue related to this one.

JKammler commented 5 months ago

I overlooked the fact that your rrd-create command does not define an AVG-archive for a 1-minute resolution. So the LoadAverage chart should not be able to display more than one value for a 5-minute interval. This seems indeed a bit strange and has nothing to do with the optimization of PDP-calculation in 1.7.2

toby1984 commented 5 months ago

My reasoning for doing it like that was (correct me if I'm wrong, my understanding of RRD is a bit,aehm,fuzzy) we're collecting the CPU load only once a minute anyway so having a 1 minute average seemed wasteful/pointless.