Open danlark1 opened 4 years ago
Also DivRem is combined in DAGCombiner::useDivRem(SDNode *Node) but
if (!TLI.isTypeLegal(VT) && !TLI.isOperationCustom(DivRemOpc, VT)) return SDValue();
returns false for 128 bit integers
I believe currently we don't recognize __udivmodti4 anywhere, in RuntimeLibcalls.def we don't instrument them at all
HANDLE_LIBCALL(SDIVREM_I8, nullptr) HANDLE_LIBCALL(SDIVREM_I16, nullptr) HANDLE_LIBCALL(SDIVREM_I32, nullptr) HANDLE_LIBCALL(SDIVREM_I64, nullptr) HANDLE_LIBCALL(SDIVREM_I128, nullptr) HANDLE_LIBCALL(UDIVREM_I8, nullptr) HANDLE_LIBCALL(UDIVREM_I16, nullptr) HANDLE_LIBCALL(UDIVREM_I32, nullptr) HANDLE_LIBCALL(UDIVREM_I64, nullptr) HANDLE_LIBCALL(UDIVREM_I128, nullptr)
__udivmodti4 should be presented on every LP64 bit platform, I believe
The DivRemPairs pass turns the IR into this:
define i128 @udiv128(i128 %a, i128 %b) { %a.frozen = freeze i128 %a %b.frozen = freeze i128 %b %quot = udiv i128 %a.frozen, %b.frozen %1 = mul i128 %quot, %b.frozen %rem.decomposed = sub i128 %a.frozen, %1 %sum = add i128 %rem.decomposed, %quot ret i128 %sum }
That's based on the TTI call: bool X86TTIImpl::hasDivRemOp(Type *DataType, bool IsSigned) ...returning false for the 128-bit type.
But even if I hack that to return 'true', I see calls: callq divti3 callq modti3
Where in optimization do we recognize that the target supports "__udivmodti4" and convert to that call?
Extended Description
128 bit division generates udivti3 and umodti3 instead of calling __udivmodti4 once
This happens because of DivRemPairs pass and lack of instrumentation in the backend.
; Unsigned 128-bit division define i128 @udiv128(i128 %a, i128 %b) { %quot = udiv i128 %a, %b %rem = urem i128 %a, %b %sum = add i128 %quot, %rem ret i128 %sum }
=>
https://gcc.godbolt.org/z/PorhMz
Will call udivti3 on LP64 but libgcc and compiler-rt have udivmodti4 which computes the quotient and the remainder at the same time. This particular hurts x86 as divq instruction is presented. Other backends can also benefit from this too