Closed yyctw closed 10 months ago
Some thoughts on the AES code
If need be, we can split out the AES code to a separate PR to keep going here without it.
Some thoughts on the AES code
If need be, we can split out the AES code to a separate PR to keep going here without it.
Ok, I have removed it.
Thank you @yyctw !
MSVC is still unhappy about something: https://ci.appveyor.com/project/nemequ/simde/builds/48387860/job/84vqj1g1r974k4sk#L2557
Thank you @yyctw !
MSVC is still unhappy about something: https://ci.appveyor.com/project/nemequ/simde/builds/48387860/job/84vqj1g1r974k4sk#L2557
It should already be fixed.
Thank you @yyctw ! MSVC is still unhappy about something: https://ci.appveyor.com/project/nemequ/simde/builds/48387860/job/84vqj1g1r974k4sk#L2557
It should already be fixed.
https://ci.appveyor.com/project/nemequ/simde/builds/48389095/job/2ifnp7paf0l7m9t0#L2575 :-/
Thank you @yyctw ! MSVC is still unhappy about something: https://ci.appveyor.com/project/nemequ/simde/builds/48387860/job/84vqj1g1r974k4sk#L2557
It should already be fixed.
https://ci.appveyor.com/project/nemequ/simde/builds/48389095/job/2ifnp7paf0l7m9t0#L2575 :-/
https://ci.appveyor.com/project/nemequ/simde/builds/48394475 :-)
Review of the first 50 files (thanks!)
Fixed it. Thanks for your review!
@mr-c All comments have been addressed. Thank you very much for your patience and review!
Good news: when all my indicated changes are made, the new tests pass on my mobile phone (-march=cortex-a76.cortex-a55
) using GCC 13.2 .
I had added commit cde4b78 ; please restore that
Sorry for the overwrite. I don't have a local commit record, so I can't restore your commit. Could you please restore this commit yourself? Thank you.
TL;DR: SIMDe currently implements 6443 out of 6670 (96.60%) NEON functions. If you don't count bf16 types, it's 6443 / 6466 (99.64%).
!!!
Thank you @yyctw !
No problem. Thank you @mr-c for your review!
Hi all, this is Eric from Andes Technology Corporation. This PR is the remaining part of the previous PR and includes the following:
Implement all poly-related types using
uint
. Implement all functions related to thepoly
type (with test cases). Implement all functions related to thebf16
type (without test cases). Add 1035 initial implementations and corresponding test cases in 137 families which are listed below:add
,aes
,bsl
,ceq
,ceqz
,cmla
,cmla_rot180
,cmla_rot270
,cmla_rot90
,cnt
,combine
,copy_lane
,crc32
,create
,cvt
,div
,dot
,dot_lane
,dup_lane
,dup_n
,eor
,ext
,fmlal
,fmlsl
,get_high
,get_lane
,get_low
,ld1
,ld1_dup
,ld1_lane
,ld1_x2
,ld1_x3
,ld1_x4
,ld1q_x2
,ld1q_x3
,ld1q_x4
,ld2
,ld2_dup
,ld2_lane
,ld3
,ld3_dup
,ld3_lane
,ld4
,ld4_dup
,ld4_lane
,maxnm
,maxnmv
,maxv
,minnm
,minnmv
,minv
,mmlaq
,mul
,mull
,mull_high
,mull_high_lane
,mull_high_n
,mulx
,mulx_lane
,mulx_n
,mvn
,padd
,pmax
,pmaxnm
,pmin
,pminnm
,qmovun_high
,qrdmlah
,qrdmlah_lane
,qrdmlsh
,qrdmlsh_lane
,qrdmulh_lane
,qshlu_n
,qshrun_high_n
,qshrun_n
,qtbl
,qtbx
,rax
,rbit
,recps
,recpx
,reinterpret
,rev16
,rev32
,rev64
,rnd
,rnd32x
,rnd32z
,rnd64x
,rnd64z
,rnda
,rndi
,rndm
,rndp
,rndx
,set_lane
,sha1
,sha256
,sha512
,shll_high_n
,shrn_high_n
,shrn_n
,sli_n
,sm3
,sm4
,sri_n
,st1
,st1_lane
,st1_x2
,st1_x3
,st1_x4
,st1q_x2
,st1q_x3
,st1q_x4
,st2
,st2_lane
,st3
,st3_lane
,st4
,st4_lane
,subhn_high
,sudot_lane
,tbl
,tbx
,trn
,trn1
,trn2
,tst
,types
,usdot
,usdot_lane
,uzp
,uzp1
,uzp2
,zip
,zip1
,zip2
Thanks for reading and any recommendations are welcome:tada::tada::tada:!