open-power-sdk / pveclib

Power Vector Library
Apache License 2.0
29 stars 8 forks source link

Update-p2 P10 divide operations for quadword. Closing #196 #197

Closed munroesj52 closed 2 months ago

munroesj52 commented 2 months ago

The Power8/9 implementation of divide extended quadword (vec_vdiveuq_inline() and special round to odd version (vec_diveuq_qpo()) used in float128 vec_xsdivqpo_inline) is sub-optimal. For power8/9 the implementation is a long division by doublewords using vec_divqud_inline(). This is a 128 by 64-bit divide returning a 64-bit quotient and remainder. Effectively a 4-digit dividend by 2 digit division in two steps where each digit is 64-bits.

In the extended divide case, and the divisor has two nonzero (doubleword) digits, an explicit multiple/subtract is required to obtain the remainder. Initially the implementation used a quadword by quadword multiply (vec_muludq ()). This is not required as a quadword by double word multiply is sufficient if you are carefull with the subtraction (a double quadword subtraction is require either way). This (128 by 64-bit) multiply is not defined within vec-int128_ppc.h but can be constructed using vec_vmuloud() and vec_vmaddeud().

* src/pveclib/vec_int128_ppc.h [int128_Divide_0_1_1_4]: Update Doxygen text to reflect changes, (vec_vdiveuq_inline): Implement changes. (vec_vdivuq_inline): Improve vec_srqi(q0, 63) shift.

* src/testsuite/arith128_test_i128.c (db_vec_diveuq): Copy can update from original with changes. (db_vec_diveuq)_V0): Rename original to allow comparisons.
* src/testsuite/vec_int128_dummy.c (test_vec_diveuq): Updated compile test implementation. (test_vec_diveuq)_V0): Rename original to allow generated code comparisons.