oetiker / rrdtool-1.x

RRDtool 1.x - Round Robin Database
http://www.rrdtool.org
GNU General Public License v2.0
992 stars 260 forks source link

segmentation fault when creating graph #368

Open nhale25 opened 11 years ago

nhale25 commented 11 years ago

I get a seg fault when creating a graph of an database which has an RRA with a large number of rows. Reducing the number of rows in the RRA fixes it.

I am using rrdtool version 1.4.7 on Ubuntu 12.04. The following commands recreate the problem for me:

rrdtool create testDb.rrd \ DS:temperature:GAUGE:1200:-273:1000 \ RRA:AVERAGE:0.5:30:1752000

rrdtool graph graph.png \ --start 1356998400 \ --end 1359676800 \ DEF:14338=testDb.rrd:temperature:AVERAGE \ LINE2:14338#e41a1c:test

oetiker commented 11 years ago

RRD database covering the next 2000 years does sound a bit excessive, don't you think ? Are you on a 32bit system?

nhale25 commented 11 years ago

Sorry, my bad, I was a bit keen with the minimization of the test code. I meant to create it like this:

rrdtool create testDb.rrd \ --step 60 \ DS:temperature:GAUGE:1200:-273:1000 \ RRA:AVERAGE:0.5:30:1752000

Which still shows the same problem, and lasts for 100 years. I know that's excessive, but an error message giving the actual limit would be better than a seg fault.

Yes, 32-bit OS.

oetiker commented 11 years ago

I agree error message would be much better ... up to writing a patch ?

nhale25 commented 11 years ago

OK, I've had a look, and for my test case, the problem is on line 407 in rrd_fetch.c start_offset = (long)(start + step - rra_start_time) / (long) *step; The cast of the numerator happens after the addition has already overflowed (rra_start_time is large and negative). This causes start_offset to be wildly wrong when it comes to looping through filling in the data array on line 437. The loop executes too many times, writing outside of the malloc'd area, and (presumably) onwards out of the process address space, causing the seg fault from line 445. The following patch fixes this.

On a related note, it looks like if one created an RRA whose timespan is greater than can be represented by time_t, there would still be problems all over the place. I'd be happy to make a patch generating a warning for this case, if you think that's the right answer / worth doing?

Patch to fix the above test case:

--- a/src/rrd_fetch.c
+++ b/src/rrd_fetch.c
@@ -404,8 +404,8 @@ int rrd_fetch_fn(
     rra_start_time = (rra_end_time
                       - (*step * (rrd.rra_def[chosen_rra].row_cnt - 1)));
     /* here's an error by one if we don't be careful */
-    start_offset = (long) (*start + *step - rra_start_time) / (long) *step;
-    end_offset = (long) (rra_end_time - *end) / (long) *step;
+    start_offset = ((long) *start + *step - rra_start_time) / (long) *step;
+    end_offset = ((long) rra_end_time - *end) / (long) *step;
 #ifdef DEBUG
     fprintf(stderr,
             "rra_start %lu, rra_end %lu, start_off %li, end_off %li\n",
oetiker commented 11 years ago

oops, good catch, can you make it a pull request ? and yes, a warning at create time would be great. note though, that the check would have to take the local time implementation into account ... there is already code in configure.ac to check whether the system does 32 or 64bit time.

mperzl commented 11 years ago

Please check out my comment regarding this fix for RRDTool on AIX running in 32-bit at: https://github.com/nhale25/rrdtool-1.x/commit/5544ed8bc4a56b2f566d0a96e154a590239af8d2#commitcomment-3658266