llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.23k stars 12.07k forks source link

[AArch64][GlobalISel] Overall GISel operation status #115133

Open davemgreen opened 2 weeks ago

davemgreen commented 2 weeks ago

This is a copy of an internal page me and @chuongg3 had when going through each of the operations for AArch64 GISel, making sure they don't fall back. Not all of it is complete yet (and the internal version had a few more details), but it is better to have this upstream. Some of it might now be out of date.

A few high level comments

Legend:

Operation Scalar normal Vector legal ptr i128/i1 Vector larger / smaller Scalar ext Vector odd widths Vector odd eltsizes Additional Notes
load y y y/y
store y y y/y
bitcast? ptrtoint? inttoptr? y y ptrs can go wrong with v3 types https://godbolt.org/z/nfqG3qGeb.
getelementptr y
phi y y y/y y
select
memcpy? memmove? memset? bzero?
Int Operation Scalar normal Vector normal i128 s/v i1 s/v Vector larger / smaller Scalar non-power-2 Vector odd widths Vector odd eltsizes Additional Notes
add y y y/y y y x x https://godbolt.org/z/6c1rfWTK8
sub y y y/y y y x x
mul y y y/y inefficient y Scalar i128 #115512. https://godbolt.org/z/8Wd8zhezc
sdiv, udiv y y y/y y Scalar i1 could be simpler. https://godbolt.org/z/45qMq6cvh.
srem, urem y y y/y y Scalar i1:
zext, sext, anyext y y ZEXT: Global ISel could be improved to match SDAG by using BIC
trunc y y y x Non-pow2 larger than 8
and y y y/y y https://godbolt.org/z/6Y98TnYv8
or y y y/y y
xor y y y/y y
- not y y y y https://godbolt.org/z/rh4ob1be7
shl y y y y (v2i8) x Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
ashr y y y y(v2i8) x Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
lshr y y y y(v2i8) x Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
icmp y y y (i128 could be better) x y(v2i8) i128 could do a lot better.
select y y y y (v2i8) Scalarl: Unnecessary AND to clear upper lanes of the condition register
abs y y y x y https://godbolt.org/z/Tobs7YeoT
smin/smax/umin/umax y y y y x > i128 i1/i128 could do better. https://godbolt.org/z/j7nx789oz.
uaddsat/usubsat/saddsat/ssubsat y y y y https://godbolt.org/z/4MT14bfsv
bitreverse y x y y https://godbolt.org/z/3sd988Mhd
bswap y x y x y
ctlz y y y y x > i128
cttz y y y x x > i128
ctpop y y y x x
fshr/fshl y y y x x NonPow2 > 128 Scalar Normal:
- rotr/rotl? y y y y y
uaddo, usubo, uadde, usube?
umulo, smulo?
umulh, smulh
ushlsat, sshlsat
smulfix, umulfix
smulfixsat, umulfixsat
sdivfix, udivfix
sdivfixsat, udivfixsat
FP Operation Scalar normal Vector legal f128 s/v Vector smaller / larger bf16 s/v Vector widths Additional Notes
fadd y y y/y y https://godbolt.org/z/bYWfo9v16
fsub y y y/y y
fmul y y y/y y
fma y y y/y y https://godbolt.org/z/1osE3Whaq
fmuladd y y y/y y
fdiv y y y/y y
frem y y y/y y
fneg y y y/y y https://godbolt.org/z/rz96eh3PW
fpext y y y/y y https://godbolt.org/z/358EG4j7r
fptrunc y y y/y y https://godbolt.org/z/7a7hq6j68
fptosi, fptoui y y y/y y
fptosisat, fptouisat
sitofp, uitofp y y y/y y https://godbolt.org/z/j7Prz7qj6
fabs y y y/y y https://godbolt.org/z/o95h4a9es
fsqrt y y y/y y
ceil, floor, trunc, rint, nearbyint y y y/y y https://godbolt.org/z/zjMqq5oeo
lrint, llrint, lround, llround
fminnum, fmaxnum y y y/y y
fminimum, fmaximum y y y
fminimumnum, fmaximumnum
fcopysign y y y/y y https://godbolt.org/z/aq5bbc4jG
fpow y y y/y y https://godbolt.org/z/WEeWYj1e4
fpowi y y y/y y
sin, cos, etc y y y/y y
fexp, fexp2, flog, flog2, flog10 y y y/y y
fldexp, frexmp
fcanonicalize
is_fpclass
Vector Operation Scalar normal Vector legal Vector smaller / larger ptr Scalar ext Vector odd widths Vector odd eltsizes Additional Notes
insert - - y y -
extract - - y y -
shuffle* - - -
- dup - - y -
- ext - - y y -
- zip1/zip2/uzp2/uzp2/trn1/trn2 - - y -
- tbl - - y y - Could do with tbl2/tbl4 combines
- reverse - - - Needs full reverses from https://godbolt.org/z/1chrbKjhs
- perfect shuffles - - -
reduce.add - - - - Integer reductions in ISel use i32 return types. They can be i8/i16 in GISel.
reduce.mul - - -
reduce.smin/smax/umin/umax - - -
reduce.and/or/xor - - -
reduce.fadd - - - Needs sequential
reduce.fmul - - - Needs sequential, plus #73309
reduce.fmin/fmax/fminimum/fmaxmum - - y - x  
llvmbot commented 2 weeks ago

@llvm/issue-subscribers-backend-aarch64

Author: David Green (davemgreen)

This is a copy of an internal page me and @chuongg3 had when going through each of the operations for AArch64 GISel, making sure they don't fall back. Not all of it is complete yet (and the internal version had a few more details), but it is better to have this upstream. Some of it might now be out of date. A few high level comments - This does not include SVE, we should probably do the same elsewhere. - BF16 still needs to be added, but requires a new way to specify the types / operations. - BigEndian isn't handled yet. - Currently some operations widen, some promote. We should stick to one (probably widen). - Blank spaces usually mean not checked / not supported. We will get to the point where random-testing will start to be more useful. Legend: - Scalar normal = i8/i16/i32/i64 - Vector legal = v8i8/v4i16/v2i32 + v16i8/v8i16/v4i32/v2i64 - Vector larger/smaller = i8/i16/i32/i64 types with non-legal sizes - i128 = scalar/vector - i1 = scalar/vector - Scalar ext = non-power2 sizes, including larger sizes - Vector odd widths = i8/i16/i32/i64 with non-power-2 widths. - Vector odd eltsize = non-power2 elt sizes (or i128, etc). |Operation| Scalar normal| Vector legal| i128| i1 | Vector larger / smaller| Scalar ext| Vector odd widths| Vector odd eltsizes| Additional Notes | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |load| y | y | | | | | | | |store| y | y | | | | | | | |bitcast? ptrtoint? inttoptr?| y | y | | | | | | | |memcpy? memmove? memset? bzero?| | | | | | | | | |Int Operation| Scalar normal| Vector normal| i128 s/v| i1 s/v| Vector larger / smaller| Scalar non-power-2| Vector odd widths| Vector odd eltsizes| Additional Notes |add| y| y| y/y | | y| y| x | x | https://godbolt.org/z/6c1rfWTK8 |sub| y| y| y/y | | y| y| x | x | |mul| y| y| y/y inefficient | | y| | | | Scalar i128 could be better. https://godbolt.org/z/8Wd8zhezc |sdiv, udiv| y| y| y/y | | y| | | | Scalar i1 could be simpler. https://godbolt.org/z/45qMq6cvh. |srem, urem| y| y| y/y| | y| | | | Scalar i1: |zext, sext, anyext| y| y| | | | | | | ZEXT: Global ISel could be improved to match SDAG by using BIC for  |trunc| y| y| y| | | | x Non-pow2 larger than 8| | |and| y| y| y/y | | y| | | | https://godbolt.org/z/6Y98TnYv8 |or| y| y| y/y | | y| | | | |xor| y| y| y/y | | y| | | | | not?| y| y| y| | y| | | | https://godbolt.org/z/rh4ob1be7 |shl| y| y| y| | y (v2i8)| | | x| Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify. |ashr| y| y| y| | y(v2i8)| | | x| Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify. |lshr| y| y| y| | y(v2i8)| | | x| Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify. |icmp| y| y| y (i128 could be better)| x | y(v2i8)| | | | i128 could do a lot better. |select| y| y| y| | y (v2i8)| | | | Scalarl: Unnecessary AND to clear upper lanes of the condition register |abs| y| y| y| | x| y| | | https://godbolt.org/z/Tobs7YeoT |smin/smax/umin/umax| y| y| y| | y| x > i128| | | i1/i128 could do better. https://godbolt.org/z/j7nx789oz. |uaddsat/usubsat/saddsat/ssubsat| y| y| y| | | y| | | https://godbolt.org/z/4MT14bfsv |bitreverse| y| x| y| | | y| | | https://godbolt.org/z/3sd988Mhd |bswap| y| x| y| | x| y| | | |ctlz| y| y| y| | y| x > i128| | | |cttz| y| y| y| | x| x > i128| | | |ctpop| y| y| y| | x| x| | | |fshr/fshl| y| y| y| | x| x NonPow2 > 128| | | Scalar Normal: | rotr/rotl?| y| y| y| | y| y| | | |uaddo, usubo, uadde, usube?| | | | | | | | | |umulo, smulo?| | | | | | | | | |umulh, smulh| | | | | | | | | |ushlsat, sshlsat| | | | | | | | | |smulfix, umulfix| | | | | | | | | |smulfixsat, umulfixsat| | | | | | | | | |sdivfix, udivfix| | | | | | | | | |sdivfixsat, udivfixsat| | | | | | | | | |FP Operation| Scalar normal| Vector legal| f128 s/v| | Vector smaller / larger| bf16 s/v| Vector widths| | Additional Notes |fadd| y| y| y/y| | y| | | | https://godbolt.org/z/bYWfo9v16 |fsub| y| y| y/y| | y| | | | |fmul| y| y| y/y| | y| | | | |fma| y| y| y/y| | y| | | | https://godbolt.org/z/1osE3Whaq |fmuladd| y| y| y/y| | y| | | | |fdiv| y| y| y/y| | y| | | | |frem| y| y| y/y| | y| | | | |fneg| y| y| y/y| | y| | | | https://godbolt.org/z/rz96eh3PW |fpext| y | y | y/y| | y | | | | https://godbolt.org/z/358EG4j7r |fptrunc| y | y| y/y| | y | | | | https://godbolt.org/z/7a7hq6j68 |fptosi, fptoui| y| y| y/y| | y | | | | |fptosisat, fptouisat| | | | | | | | | |sitofp, uitofp| y| y| y/y| | y| | | | https://godbolt.org/z/j7Prz7qj6 |fabs| y| y| y/y| | y| | | | https://godbolt.org/z/o95h4a9es |fsqrt| y| y| y/y| | y| | | | |ceil, floor, trunc, rint, nearbyint| y| y| y/y| | y| | | | https://godbolt.org/z/zjMqq5oeo |lrint, llrint, lround, llround| | | | | | | | | |fminnum, fmaxnum| y| y| y/y| | y| | | | |fminimum, fmaximum| y| y| | | y| | | | |fminimumnum, fmaximumnum| | | | | | | | | |fcopysign| y| y| y/y| | y| | | | https://godbolt.org/z/aq5bbc4jG |fpow| y| y| y/y| | y| | | | https://godbolt.org/z/WEeWYj1e4 |fpowi| y| y| y/y| | y| | | | |sin, cos, etc| y| y| y/y| | y| | | | |fexp, fexp2, flog, flog2, flog10| y| y| y/y| | y| | | | |fldexp, frexmp| | | | | | | | | |fcanonicalize| | | | | | | | | |is_fpclass| | | | | | | | | |Vector Operation| Scalar normal| | Vector legal| Vector smaller / larger| | Scalar ext| Vector odd widths| Vector odd eltsizes| Additional Notes |insert| -| -| y| y| | -| | | |extract| -| -| y| y| | -| | | |shuffle*| -| -| | | | -| | | | dup| -| -| y| | | -| | | | ext| -| -| y| y| | -| | | | zip1/zip2/uzp2/uzp2/trn1/trn2| -| -| y| | | -| | | | tbl| -| -| y| y| | -| | | Could do with tbl2/tbl4 combines | reverse| -| -| | | | -| | | Needs full reverses from https://godbolt.org/z/1chrbKjhs | perfect shuffles| -| -| | | | -| | | |reduce.add| -| -| | | | -| | | Integer reductions in ISel use i32 return types. They can be i8/i16 in GISel. |reduce.mul| -| -| | | | -| | | |reduce.smin/smax/umin/umax| -| -| | | | -| | | |reduce.and/or/xor| -| -| | | | -| | | |reduce.fadd| -| -| | | | -| | | Needs sequential |reduce.fmul| -| -| | | | -| | | Needs sequential, plus #73309 |reduce.fmin/fmax/fminimum/fmaxmum| -| -| y| | | -| x |  |
madhur13490 commented 2 weeks ago

+1. This complements some of our understanding so far.

In addition to this we are also tracking SPEC 2017, RAJAPerf and TSVC internally in SVE and nosve mode to track the number of fallbacks. Our CI emits the number of fallbacks each day on these benchmarks. This helps us to make sure we don't introduce new fallbacks.

We also found that inlineasm is not supported in GISel. (Varrgs wasn't supported until last month but @Him188 landed patch to support in instruction selector last month)

I plan to bring this to the agenda in the next AArch64 sync which @sjoerdmeijer hosts. We should coordinate on this and may be file fine-level issues so that we don't repeat the work(?)

What do you think @davemgreen?