guillaumeblanc / ozz-animation

Open source c++ skeletal animation library and toolset
http://guillaumeblanc.github.io/ozz-animation/
Other
2.43k stars 301 forks source link

Support NEON instruction set #12

Open GCCFeli opened 8 years ago

GCCFeli commented 8 years ago

It would be great if NEON is supported :)

guillaumeblanc commented 8 years ago

Yes it definitely would. I would have no ARM hardware to test the implementation though.

The process to port ozz SIMD implementation is:

The whole library, including SoA implementation, is based on the functions from simdmath*-inl.h, so there's nothing else needed.

guillaumeblanc commented 8 years ago

I reopen the request as I think it makes a lot of sense to implement it indeed.

jazzbre commented 8 years ago

https://github.com/scoopr/vectorial or even this one https://github.com/jratcliff63367/sse2neon -> good reference for sse/neon implementation.

kylawl commented 5 years ago

We're going to be starting on Switch soon. Expect a PR early next year, but if someone wants to do it before us, that would be nice!

guillaumeblanc commented 5 years ago

Awesome news @kylawl. Don't hesitate to reach me if you want to discuss this or need help/support.

kylawl commented 3 years ago

So it's been a while and I'm back looking at this again. As a first step, I thought I'd just try using sse2neon to see if there's any benefit from simply aliasing all the instructions raw like that. Performance is actually surprisingly poor going this route on Switch. The sse reference implementation takes about 1.2ms for our whole animation phase while using sse2neon takes 2.7ms! Not exactly the sort of thing I was expecting/hoping for.

I've seen some discussion that we could be throttled due to memory access overhead rather than computation, going to need some more investigation.

ColinGilbert commented 3 years ago

If I remember correctly, Bullet physics had code contributed by Apple that made it very performant on ARM/iOS. Maybe that would be worth looking at?

guillaumeblanc commented 3 years ago

Welcome back!

You say 1.2ms for "sse reference implementation". Do you mean float/scalar reference implementation? If so, it could be worth checking the generated code, to see how much the compiler auto-vectorizes the code. All the SoA usages of the math library in ozz are very easy for the compiler to auto-vectorize, so maybe neon is already at use. That doesn't mean 1.2ms can not be optimized, but optimization expectations would be lower.

Are the memory access overhead issues you mentioned specific to neon?

kylawl commented 3 years ago

You're probably right that the autovectorization is doing a decent job. One thing that sse2neon misses is the common shuffle operations that we do to splat the same value into all 4 components. For that particular shuffle operation, they use a multi instruction "generic" path even though arm has a specific instruction for handling that operation. After spending some more time on switch optimizations, I don't think this is a memory access issue. Needs further investigation for sure.

guillaumeblanc commented 7 months ago

Hi,

what did you end up doing on Switch? Did you need/implement neon optimizations ?

Cheers, Guillaume

kylawl commented 7 months ago

It's been a while, but if I remember correctly. The compiler was able to optimize the output sufficiently for us to use. We tried one of those sse to neon headers and it was significantly slower that just using the vanilla one. Baring in mind that our skeletons were only a small number of bones maybe averaging 30 bones on like max 5 characters at a time.

On Switch we were cpu bound but the minimal animation time was outstripped by the "open worldness" of the game.

Sorry we never got to completing that.

On Mon, Mar 4, 2024, 12:34 p.m. Guillaume Blanc @.***> wrote:

Hi,

what did you end up doing on Switch? Did you need/implement neon optimizations ?

Cheers, Guillaume

— Reply to this email directly, view it on GitHub https://github.com/guillaumeblanc/ozz-animation/issues/12#issuecomment-1977407012, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFY7V5T6OVDSRUHT5QYRXDYWTLD7AVCNFSM4CQSJALKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJXG42DANZQGEZA . You are receiving this because you were mentioned.Message ID: @.***>

guillaumeblanc commented 7 months ago

No worries, thanks for the feedback. I think it's good to know that reference implementation provides good results as a cross-platform fallback.