parallella / pal

An optimized C library for math, parallel processing and data movement
Apache License 2.0
301 stars 110 forks source link

math: p_atan: Implement atan Function #218

Open Adamszk opened 9 years ago

Adamszk commented 9 years ago

Corrected to eliminate divisions.

olajep commented 9 years ago

This is >6x slower than the existing code on ARM (and probably other archs too) Why did you modify the test gold data?

$ git checkout master -b test-p_atan_code
$ mkdir build && cd build
$ ../configure CFLAGS="-O3 -ffast-math -mcpu=cortex-a9 -mfpu=neon"
$ make -j2
$ ./benchmark/math/bench_p_atan_f32
;name, size, duration (ns)
p_atan_f32, 16384, 331781
$ git merge origin/pr/218
$ make -j2
$ ./benchmark/math/bench_p_atan_f32
;name, size, duration (ns)
p_atan_f32, 16384, 2082794
Adamszk commented 9 years ago

old gold data has range of input values ( vector a) less than one which means that return value (vector c) is no greater than 45 degrees. If atan algorithm is supposed to be up to 90 degrees range, then all values of vector a ( including greater than one) should also be considered for testing the code. Thus, new old data has some vector a input values greater than 1 (on positive range and less than -1 on negative range ) Old atan algorithm comes from abramowitz 4.4.47 and the range is valid from -1 to 1 therefore output is valid from -45 to 45 degrees with max error of 5 decimals. new atan algorithm range is from - infinite to infinite therefore output is -90 to 90 degrees with max error of 6 decimals I compared new atan to standard gnu compiler c atan (code:blocks) and new atan is slightly faster than standard but never slower than standard for all input values on x86 pc using linux. Timing was measured with loop included for better accuracy :
new atan -> about 2300 standard atan -> 2600 therefore the ratio of improvement is 2600/2300 or 1.1 note: no error handling just like in old one.

olajep commented 9 years ago

On 2015-08-31 15:44, Adamszk wrote:

old gold data has range of input values ( vector a) less than one which means that return value (vector c) is no greater than 45 degrees. If atan algorithm is supposed to be up to 90 degrees range, then all values of vector a ( including greater than one) should also be considered for testing the code. Thus, new old data has some vector a input values greater than 1 (on positive range and less than -1 on negative range )

Makes sense. Will merge that portion of the patch. Surely this also applies to other trig functions?

Thanks, Ola

olajep commented 9 years ago

On 2015-08-31 16:41, Ola Jeppsson wrote:

On 2015-08-31 15:44, Adamszk wrote:

old gold data has range of input values ( vector a) less than one which means that return value (vector c) is no greater than 45 degrees. If atan algorithm is supposed to be up to 90 degrees range, then all values of vector a ( including greater than one) should also be considered for testing the code. Thus, new old data has some vector a input values greater than 1 (on positive range and less than -1 on negative range )

Makes sense. Will merge that portion of the patch. Actually no.

 * Calculates inverse tangent (arc tangent) of the input value. The function                                                                                                  
 * returns a value between -pi/2 to pi/2 but does not check for illegal input                                                                                                 
 * values.

New test data is not within that range. // Ola

Surely this also applies to other trig functions?

Thanks, Ola

Phone: +46.733208642 Skype: olajep Email: ola@adapteva.com

Adamszk commented 9 years ago

If you look at sixth row ,the input vector a is equal to 6.265069 . The arctan (a) by calculator is 80.93122361 degrees. New algorithm calculation is 1.412516 in radiants which matches with standard gnu c compiler arctan(a). Converting it to degrees is 1.412516 * 360 / (2*pi) which is 80.93120. Since the maximum allowed rows of data is 100, I divided 100 by 10 to end up with ten ranges of numbers generated randomly. Hence 10 random number within 10 to 1, 10 random numbers with 1 to 0, 10 numbers within 0.1 to 0.0 (one decimal range), 10 random numbers in (two decimal range) etc. I used random number generator to ensure objective test. It just happens to have 6.2 as the largest number.
New data gold file does not apply to old atan(x). Old atan(x) cannot handle a values greater than 1.

Adamszk commented 9 years ago

@mateunho Can you share a simple short script or clues how to write one to test c code speed performance on epiphany related to arm and x86 code. Thanks.