oetiker / rrdtool-1.x

RRDtool 1.x - Round Robin Database
http://www.rrdtool.org
GNU General Public License v2.0
1.02k stars 264 forks source link

Data loss while writing values with rate greater than heartbeat #395

Open pkunilov opened 11 years ago

pkunilov commented 11 years ago

Hello, I do the following steps:

  1. Create rrd db with sampling rate 1 sec and one datasource with type GAUGE and heartbeat 2 sec
  2. Start writing values with rate 3 sec
  3. Only one first value will be written all other ones will be lost and replaced with NaN.

For example, I write 10 values with rate 3 sec (in rrd db with settings described above)

Current output: time1 Value1 time2 NaN time3 NaN time4 NaN time5 NaN time6 NaN time7 NaN time8 NaN time9 NaN time10 NaN

Expected Output: time1 Value1 time2 NaN time3 NaN time4 Value2 time5 NaN time6 NaN time7 Value3 time8 NaN time9 NaN time10 Value4

I use RRDtool 1.2.15 The issue is reproduced on Windows 7 Linux 2.6.32-358.6.1.el6.x86_64

For example, rrdtool commands which can reproduce the above behavior:

rrdtool create "./test.rrd" --start 1369748048 --step 1 DS:test:GAUGE:2:U:U RRA:MIN:0.999:1:30 rrdtool update "./test.rrd" 1369748049:1.0 rrdtool update "./test.rrd" 1369748052:2.0 rrdtool update "./test.rrd" 1369748055:3.0 rrdtool update "./test.rrd" 1369748058:4.0 rrdtool update "./test.rrd" 1369748061:5.0 rrdtool update "./test.rrd" 1369748064:6.0 rrdtool update "./test.rrd" 1369748067:7.0 rrdtool update "./test.rrd" 1369748070:8.0 rrdtool update "./test.rrd" 1369748073:9.0 rrdtool update "./test.rrd" 1369748076:10.0 rrdtool dump ./test.rrd

Is it a correct behavior? If it is correct, if there is any way to avoid data loss?

Best regards, Peter Kunilov

mschaefers commented 9 years ago

Any updates on this issue? We are indirectly facing this issue with RRD4J whose aim is to behave like rrdtool: https://code.google.com/p/rrd4j/issues/detail?id=43

oetiker commented 9 years ago

please try with the latest version of rrdtool. As long as you use a DS type other than COUNTER or DERIVE you should not be loosing data anymore. --- correction, I did not pay close attention when reading the problem description ... 3 second update interval and 2 second heartbeat will always result in NAN ... this what the heartbeat should be doing, and it is ...

themuvarov commented 9 years ago

The issue is reproduced on the latest released version v1.5.0-rc2.

tried 1.3.8, 1.4.7 and eventually https://github.com/oetiker/rrdtool-1.x/tree/v1.5.0-rc2

Enviroment: 2.6.32-504.el6.x86_64

Steps to reproduce: rrdtool create ./test1_5_0.rrd --start 1369748048 --step 1 DS:test:GAUGE:2:U:U RRA:MIN:0.999:1:30

rrdtool update ./test1_5_0.rrd 1369748049:1.0 rrdtool update ./test1_5_0.rrd 1369748052:2.0 rrdtool update ./test1_5_0.rrd 1369748055:3.0 rrdtool update ./test1_5_0.rrd 1369748058:4.0 rrdtool dump ./test1_5_0.rrd

Results:

<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE rrd SYSTEM "http://oss.oetiker.ch/rrdtool/rrdtool.dtd">

0003 1 1369748058 ``` test GAUGE 2 NaN NaN 4.0 NaN 0 MIN 1 9.9900000000e-01 NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0000000000e+00 NaN NaN NaN NaN NaN NaN NaN NaN NaN ```

Only the first measurement 1369748049 has been saved it DB. Others were lost.

oetiker commented 9 years ago

Well, if you have a required heartbeat of 2 seconds, but you update every 3 seconds, you will get unknown data (that is what the heartbeat is for, to catch long intervals).

The reason your first update works, is that you create at 1369748048 and update at 1369748049 which is only 1 second apart and with a GAUGE type datasource, one update is all you need for a valid entry. (with COUNTER or DERIVE you would need two obviously).

themuvarov commented 9 years ago

Please confirm that loosing first result after long interval is a design intent?

It there a way to avoid loss?

oetiker commented 9 years ago

Yes this is BY DESIGN. But I agree, that if you use DS Type GAUGE one could argue, that you can backfill data up to mrhb into the past from the current point in time. Would you like to work on such a feature?