open-power-sdk / pveclib

Power Vector Library
Apache License 2.0
29 stars 8 forks source link

Add P10 divide/modulo operations for doubleword. #184

Closed munroesj52 closed 1 year ago

munroesj52 commented 1 year ago

POWER10 added vector divde/divide-extended/modulo instrictions. This is the initial patch for the doubleword implementations. A later patch will add the platform specific implementations for static and IFUNC enabled dynamic libraries.

* src/pveclib/vec_int64_ppc.h: General updates. [i64_missing_ops_0_0_PWR8]: New subsection "POWER8" header. [i64_missing_ops_0_0_PWR9]: New subsection "POWER9" header. [i64_missing_ops_0_0_PWR10: New subsection "POWER10" header. [i64_missing_ops_0_1]: New subsection test "Challenges and opportunities". [i64_missing_ops_0_2_1]: Update subsubsection "Doubleword integer multiplies". General updates. [i64_missing_ops_0_2_2]: New subsubsection text "Doubleword Integer Divide/Modulo". [i64_missing_ops_0_2_2_0]: New paragraph text "Vectorizable Divide implementations". [i64_missing_ops_0_2_2_1]: New paragraph text "Vectorized Shift-Subtract Divide". [i64_missing_ops_0_2_2_2]: New paragraph text "Transfer Vector elements for scalar divide". [i64_missing_ops_0_2_2_3]: New paragraph text "Special consideration for POWER7 and earlier". (vec_divqud_inline, vec_muludm, vec_pasted, vec_mrghd, vec_mrgld, vec_swapd, vec_rldi, vec_selud, vec_setb_sd, vec_sldi, vec_splat_u64, vec_splatd, vec_vdiveud_inline, vec_vdivud_inline [@cond INTERNAL]): Forward declares added. (vec_divdud_inline, vec_divqud_inline): New inline functions. (vec_maxsd, vec_maxud [(__GNUC__ >= 10)]): Use vec_max intrinsic. (vec_minsd, vec_minud [(__GNUC__ >= 10)]): Use vec_min intrinsic. (vec_moddud_inline): New inline function. (vec_msumudm, vec_muleud, vec_mulhud, vec_muloud, vec_muludm): Correct copybrief refs. (vec_setb_sd [(__GNUC__ >= 12)]): Use vec_expandm intrinsic. (vec_vdiveud_inline, vec_vdivud_inline): New inline functions. (vec_vmadd2eud, vec_vmaddeud, vec_vmadd2oud, vec_vmaddoud): Correct copybrief refs. (vec_vmodud_inline): New inline function. (vec_vmuleud, vec_vmuloud, vec_vmsumeud, vec_vmsumoud): Correct copybrief refs.

* src/testsuite/arith128_test_i64.c (db_vec_modud, db_vec_divqud): New debug function. (test_vec_divide_dw, test_vec_modulo_dw, test_vec_divide_qud): New unit test functions. (test_vec_i64): Add new unit test to driver.

* src/testsuite/pveclib_perf.c (test_time_i128): Add new timed tests to driver.
* src/testsuite/vec_perf_i128.c (timed_divmodud, timed_lib_divmodud, timed_divqud): New timed test functions.
* src/testsuite/vec_perf_i128.h (timed_divmodud, timed_lib_divmodud, timed_divqud): New timed test externs.

* src/testsuite/vec_int64_dummy.c (test_divdud, test_moddud, test_divqud, test_divud, test_divude, test_modud, test_divmodud, test_divmoddud): New compile tests. (test_vec_divud, test_vec_divude, test_vec_modud, test_vec_divmodud_V1, test_vec_divmodud_V0, test_vec_divdud, test_vec_moddud, test_vec_divdud_V1, test_vec_divdud_V0, test_vec_divqud, test_vec_divmoddud, test_vec_divmoddud_V0): Experimental compile tests.

* src/testsuite/vec_pwr10_dummy.c (test_divmodud_PWR10, test_divqud_PWR10, test_vec_divud_PWR10, test_vec_divude_PWR10, test_vec_modud_PWR10, test_vec_divdud_PWR10): New compile tests.

* src/testsuite/vec_pwr9_dummy.c (test_divmodud_PWR9, test_divqud_PWR9, test_vec_divud_PWR9, test_vec_divude_PWR9, test_vec_modud_PWR9, test_vec_divqud_PWR9, test_vec_divdud_PWR9): New compile tests.