Open mratsim opened 4 weeks ago
@llvm/issue-subscribers-backend-x86
Author: Mamy Ratsimbazafy (mratsim)
t169: i64,i8 = usubo_carry t85, t117, t168:1
t171: i64 = srl t169, Constant:i8<63>
t172: i8 = truncate t171
t184: i8 = and t172, Constant:i8<1>
t137: i64 = select t184, t78, t135
I think if we'd not ended up with this truncate we'd have detected the signbit test
This is an alternative implementation of LLVM modular addition from https://github.com/llvm/llvm-project/issues/103717 that uses raw LLVM IR instead of the builtin
llvm.usub.with.overflow.iXXX
The code for i256 is optimal but not for i320 or i384 (similar to the previous issue, there seems to be a size threshold after which LLVM gives up removing redundant instructions).
https://alive2.llvm.org/ce/z/g_nP8g
Full code
Original IR
After opt -O3
Assembly
Analysis
For i256, the code is optimal and after 4 sub/sbb (or 4 add/adc with negated inputs) we directly have a conditional move sequence:
However for i320 or i384, an unnecessary
SAR
instruction get added