oetiker / rrdtool-1.x

RRDtool 1.x - Round Robin Database
http://www.rrdtool.org
GNU General Public License v2.0
992 stars 260 forks source link

'Premature end of response packet' when working with rrdcached #782

Open murrant opened 7 years ago

murrant commented 7 years ago

RRDtool 1.6.0

Sample file: https://www.dropbox.com/s/qynvw6iml22sz8x/app-mysql-1.rrd.gz?dl=0

rrdcached should be running. Here are the options I used: /usr/bin/rrdcached -w 1800 -z 1800 -f 3600 -s librenms -U librenms -G librenms -B -R -j /var/tmp -l unix:/run/rrdcached.sock -t 4 -F -b /opt/librenms/rrd/

Start up rrdtool - Paste this: graph - DEF:temp=app-mysql-1.rrd:BRd:AVERAGE -d unix:/run/rrdcached.sock or any other ds Then run it again. Output:

rrdtool -
graph - DEF:temp=app-mysql-1.rrd:BRd:AVERAGE -d unix:/run/rrdcached.sock
ERROR: rrdc_fetch: Got 180 lines, expected 295
graph - DEF:temp=app-mysql-1.rrd:BRd:AVERAGE -d unix:/run/rrdcached.sock
ERROR: rrdc_fetch: Premature end of response packet

Everything works fine when not using rrdcached. Not using rrdtool - results in just the Got 180, expected 295 output every time.

murrant commented 7 years ago

Additionally, this is the rrdcreate command:

rrdtool create app-mysql-1.rrd --step 300 DS:IDBLBSe:GAUGE:600:0:125000000000 DS:IBLFh:DERIVE:600:0:125000000000 DS:IBLWn:DERIVE:600:0:125000000000 DS:SRows:DERIVE:600:0:125000000000 DS:SRange:DERIVE:600:0:125000000000 DS:SMPs:DERIVE:600:0:125000000000 DS:SScan:DERIVE:600:0:125000000000 DS:IBIRd:DERIVE:600:0:125000000000 DS:IBIWr:DERIVE:600:0:125000000000 DS:IBILg:DERIVE:600:0:125000000000 DS:IBIFSc:DERIVE:600:0:125000000000 DS:IDBRDd:DERIVE:600:0:125000000000 DS:IDBRId:DERIVE:600:0:125000000000 DS:IDBRRd:DERIVE:600:0:125000000000 DS:IDBRUd:DERIVE:600:0:125000000000 DS:IBRd:DERIVE:600:0:125000000000 DS:IBCd:DERIVE:600:0:125000000000 DS:IBWr:DERIVE:600:0:125000000000 DS:TLIe:DERIVE:600:0:125000000000 DS:TLWd:DERIVE:600:0:125000000000 DS:IBPse:GAUGE:600:0:125000000000 DS:IBPDBp:GAUGE:600:0:125000000000 DS:IBPFe:GAUGE:600:0:125000000000 DS:IBPMps:GAUGE:600:0:125000000000 DS:TOC:GAUGE:600:0:125000000000 DS:OFs:GAUGE:600:0:125000000000 DS:OTs:GAUGE:600:0:125000000000 DS:OdTs:COUNTER:600:0:125000000000 DS:IBSRs:DERIVE:600:0:125000000000 DS:IBSWs:DERIVE:600:0:125000000000 DS:IBOWs:DERIVE:600:0:125000000000 DS:QCs:GAUGE:600:0:125000000000 DS:QCeFy:GAUGE:600:0:125000000000 DS:MaCs:GAUGE:600:0:125000000000 DS:MUCs:GAUGE:600:0:125000000000 DS:ACs:DERIVE:600:0:125000000000 DS:AdCs:DERIVE:600:0:125000000000 DS:TCd:GAUGE:600:0:125000000000 DS:Cs:DERIVE:600:0:125000000000 DS:IBTNx:DERIVE:600:0:125000000000 DS:KRRs:DERIVE:600:0:125000000000 DS:KRs:DERIVE:600:0:125000000000 DS:KWR:DERIVE:600:0:125000000000 DS:KWs:DERIVE:600:0:125000000000 DS:QCQICe:DERIVE:600:0:125000000000 DS:QCHs:DERIVE:600:0:125000000000 DS:QCIs:DERIVE:600:0:125000000000 DS:QCNCd:DERIVE:600:0:125000000000 DS:QCLMPs:DERIVE:600:0:125000000000 DS:CTMPDTs:DERIVE:600:0:125000000000 DS:CTMPTs:DERIVE:600:0:125000000000 DS:CTMPFs:DERIVE:600:0:125000000000 DS:IBIIs:DERIVE:600:0:125000000000 DS:IBIMRd:DERIVE:600:0:125000000000 DS:IBIMs:DERIVE:600:0:125000000000 DS:IBILog:DERIVE:600:0:125000000000 DS:IBISc:DERIVE:600:0:125000000000 DS:IBIFLg:DERIVE:600:0:125000000000 DS:IBFBl:DERIVE:600:0:125000000000 DS:IBIIAo:DERIVE:600:0:125000000000 DS:IBIAd:DERIVE:600:0:125000000000 DS:IBIAe:DERIVE:600:0:125000000000 DS:SFJn:DERIVE:600:0:125000000000 DS:SFRJn:DERIVE:600:0:125000000000 DS:SRe:DERIVE:600:0:125000000000 DS:SRCk:DERIVE:600:0:125000000000 DS:SSn:DERIVE:600:0:125000000000 DS:SQs:DERIVE:600:0:125000000000 DS:BRd:DERIVE:600:0:125000000000 DS:BSt:DERIVE:600:0:125000000000 DS:CDe:DERIVE:600:0:125000000000 DS:CIt:DERIVE:600:0:125000000000 DS:CISt:DERIVE:600:0:125000000000 DS:CLd:DERIVE:600:0:125000000000 DS:CRe:DERIVE:600:0:125000000000 DS:CRSt:DERIVE:600:0:125000000000 DS:CSt:DERIVE:600:0:125000000000 DS:CUe:DERIVE:600:0:125000000000 DS:CUMi:DERIVE:600:0:125000000000  RRA:AVERAGE:0.5:1:2016 RRA:AVERAGE:0.5:6:1440 RRA:AVERAGE:0.5:24:1440 RRA:AVERAGE:0.5:288:1440  RRA:MIN:0.5:1:720 RRA:MIN:0.5:6:1440     RRA:MIN:0.5:24:775     RRA:MIN:0.5:288:797  RRA:MAX:0.5:1:720 RRA:MAX:0.5:6:1440     RRA:MAX:0.5:24:775     RRA:MAX:0.5:288:797  RRA:LAST:0.5:1:1440  -O --daemon unix:/run/rrdcached.sock
oetiker commented 7 years ago

hmmm my guess is that you are running into some limitation by having a very wide rrd file ...

murrant commented 7 years ago

Interesting, ok.

murrant commented 7 years ago

@oetiker I noticed for some reason we update -1 values instead of U for this rrd even though we defined the min as 0 for all DS. Could this be the issue?

oetiker commented 7 years ago

updating with -1 when you have configured 0 as a minimum value has the same effect as updating with U

murrant commented 7 years ago

Odd, as changing our update to use U, and graphing now works when using rrdtool - and rrdcached (as long as you don't graphs values updated with -1).

oetiker commented 7 years ago

odd indeed :) you could try running rrdtool dump on both copies of the rrd file to see what the difference is ... maybe there is some bug in rrdcached causing the different behaviour ...

murrant commented 7 years ago

There is a difference between the two, each one should have the same ds' with valid values (>=0) and NaN. Bad entry:

 <!-- 2017-05-06 21:55:00 CDT / 1494125700 --> <row><v>8.3886080000e+06</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>1.8333055732e+00</v><v>4.4991303744e-01</v><v>0.0000000000e+00</v><v>1.4284368281e-01</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>1.0628497657e+01</v><v>0.0000000000e+00</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>4.0000000000e+02</v><v>2.4000000000e+01</v><v>1.3600000000e+02</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>1.0485760000e+06</v><v>1.0313360000e+06</v><v>1.5100000000e+02</v><v>2.3000000000e+01</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>9.0000000000e+00</v><v>8.6533962204e-02</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>3.3582490668e-03</v><v>9.3250460337e-02</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>1.6299317721e-01</v><v>0.0000000000e+00</v><v>4.1375281124e-01</v><v>0.0000000000e+00</v><v>1.4360057684e+00</v><v>0.0000000000e+00</v><v>4.4492815112e+03</v><v>6.2938491803e+03</v><v>1.7462895147e-01</v><v>1.7306792441e-01</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>5.7379228230e+00</v><v>4.7722556030e+00</v><v>0.0000000000e+00</v></row>

Good Entry:

<!-- 2017-05-08 09:55:00 CDT / 1494255300 --> <row><v>8.3886080000e+06</v><v>NaN</v><v>NaN</v><v>1.8646782837e+00</v><v>5.3958522812e-01</v><v>0.0000000000e+00</v><v>3.0100002668e-01</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>3.3454287335e+01</v><v>0.0000000000e+00</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>4.0000000000e+02</v><v>2.4000000000e+01</v><v>1.3700000000e+02</v><v>0.0000000000e+00</v><v>NaN</v><v>NaN</v><v>NaN</v><v>1.0485760000e+06</v><v>1.0313360000e+06</v><v>1.5100000000e+02</v><v>2.3000000000e+01</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>9.6658762100e+00</v><v>7.9860551735e-02</v><v>NaN</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>2.7704712728e-02</v><v>2.7832291791e-01</v><v>0.0000000000e+00</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>1.4963567215e-01</v><v>0.0000000000e+00</v><v>5.1010649788e-01</v><v>0.0000000000e+00</v><v>9.1381618287e+00</v><v>0.0000000000e+00</v><v>5.0256250880e+03</v><v>2.4954577989e+04</v><v>1.5716196637e+00</v><v>1.3100482527e-01</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>2.7345012600e+01</v><v>4.2815261468e+00</v><v>0.0000000000e+00</v></row>
oetiker commented 7 years ago

Ah, I know where the difference is ... the minimum and maximum values are for the consolidated primary values ... and they will be different when adding up positive and negative rates ... wheras by updating with U it will always be adding unknown + known ... not negative + positive ... the outcome will be different

murrant commented 7 years ago

I've worked around this on our end.

Feel free to close or keep open if you are going to apply a bugfix.

barryodonovan commented 4 years ago

I am also experiencing (as are at least a few other users) this issue on LibreNMS. The issue occurs with an rrd containing 78 DS. The issue presents in the same manner as above, getting fewer lines than expected, premature end of packet, etc. When the rrd is first created and all rows contain NaN I can fetch data using rrdcached, once the first row is populated the error presents. I have reduced rrd to 77 DS and the issue has not presented with ~18 hours of data. The issue does not present if I fetch against the rrd without using rrdcached.

Is it possible that this is related to the RRD_CMD_MAX buffer size in some way?

barryodonovan commented 4 years ago

I have built and tested v1.7.2 with RRD_CMD_MAX set to 8192 and am not seeing the issue. I didn't get a chance to build it again with default RRD_CMD_MAX size to rule out the bug being fixed in the new version and I may not get a chance until next week at the earliest.

pokotilenko commented 4 years ago

Same issue here with v1.6.0. 58 DS file works fine, 78 DS file have issue when accessed through rrdcached (rrdtool fetch -d ...) Confirming that rebuild with RRD_CMD_MAX set to 8192 fixes this issue (at least for 78 DS). Replacing just rrdcached did not help, replacing also rrdtool helped.

oetiker commented 4 years ago

PR is welcome!