uwplse / herbgrind

A Valgrind tool for Herbie
GNU General Public License v3.0
90 stars 7 forks source link

error reported by herbgrind #34

Closed sangeeta0201 closed 6 years ago

sangeeta0201 commented 6 years ago

Hi,

I have run herbgrind with one of the microbenchmarks - sum-50.c Herbgrind reports 2.000000 bits average error aggregated over 1 instances but there is only fp computation and that is 'add' and its executed 52 times then why herbgrind reports average over 1 instance, whereas compare is reported over 52 times. The average error is incorrect if I understand correctly. The average error should be total error divided by 52.

int main() {
  volatile double x;
  for (x = 0.0; x < 10.0; x += 0.2);
  printf("%.20g\n", x);
}
./herbgrind.sh bench/sum-50.c.out
==18853== Herbgrind, a valgrind tool for Herbie
==18853== Copyright (C) 2016-2017, and GNU GPL'd, by Alex Sanchez-Stern
==18853== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==18853== Command: bench/sum-50.c.out
==18853== 
10.199999999999995737
==18853== 

herbgrind$ cat bench/sum-50.c.out.gh 
Output @ sum-50.c:6 in main (addr 40057D)
2.000000 bits average error
2.000000 bits max error
Aggregated over 1 instances
Influenced by erroneous expression:

No influences found!

compare @ sum-50.c:5 in main (addr 400554)
1% incorrect
1 incorrect values
52 total instances
Influenced by erroneous expressions:

No influences found!

Also, when I enabled logging in herbgrind it prints -

The shadow value is 7.00000000000000e0, but 7.000000 was computed.
2.321928 bits error (4 ulps)

but max error reported is 2.0. It should be 2.3.

pavpanchekha commented 6 years ago

I explained the average error in the other bug report—it is the average error of the output, which is only executed once—but the logging you point out is interesting, and @HazardousPeach should take a look. I think it is likely that one point in the code is executing log2(ulps + 1), which is usually correct, while another point is executing log2(ulps), which is usually incorrect.

HazardousPeach commented 6 years ago

Hey @sangeeta0201. It looks like what is happening here is that, due to the non-compositionality of floating-point error, an error of 4 ulps partway through the loop only results in an error of 3 ulps at the end of the loop. The logging output you're showing says that when the (true) value of x was 7, the computed value of x was 4 ulps away from that. However, the max error reported in the output is not for all intermediary values of x, only for the values of x which hit the print statement. By the time the loop ends, and the computed value of x is at least 10.0, the error has been reduced to 3 ulps.

In this case, the terms "max error" and "average error" in the output are slightly misleading: because the output only fires once, they both refer to the single error value at that firing, not a set of error values throughout the program.

P.S. Glad to see you've discovered my debug flags. If you're getting them from the --debug-help/--help outputs, you should know that those aren't completely up to date, as I've only sporadically added flags to that output. You can find all the flags present in Herbgrind in the src/options.c/h files, they should be simple enough to read.

sangeeta0201 commented 6 years ago

Hi @HazardousPeach Thanks for quick response. But I am wondering why your error report depends on printf? What if a developer does not print the fp computation but pass it to another function, an error would be there(if there is) but herbgrind won't report it. I am just curious to know why you chose to do this way.

pavpanchekha commented 6 years ago

Sorry for the slow reply @sangeeta0201. The error report depends on printf because not all intermediate errors in a floating-point program correspond to errors in the program's behavior or output. So, Herbgrind's output lists first off errors in the program's behavior or output, and then lists possible causes for each. This is important in a variety of cases, though not so much so in a small microbenchmark like you have here. For example, it is common for numerical experts to purposely cause floating-point error to occur, so that the error could be saved and the computation adjusted later to account for it. Herbgrind in many cases can understand and account for these tricks, but it requires Herbgrind to produce an error only after a full chain of computation has occured.