Matrix multiply example/benchmark?

hollance / neural-engine

Everything we actually know about the Apple Neural Engine (ANE)

MIT License

1.97k stars 72 forks source link

Matrix multiply example/benchmark? #3

Closed chadbrewbaker closed 3 years ago

chadbrewbaker commented 3 years ago

Stupid question. How do you do float16 square matrix matrix multiply with their SDK? Can it handle sparse matrix multiply.

I'm interested in hacking it to do string parsing - https://en.wikipedia.org/wiki/CYK_algorithm.

hollance commented 3 years ago

Which SDK are you referring to?

You can do a matrix multiply on the CPU using Accelerate framework's BLAS functions. It might have support for sparse matrices but not sure.

You can also do matrix multiplies on the GPU using Metal.

I wrote a blog post a while ago to compare these approaches: https://machinethink.net/blog/mps-matrix-multiplication/

You can also do a matrix multiply using Core ML by creating a neural network that has a BatchedMatMul layer. Then Core ML will decide whether to run it on the CPU, GPU, or Neural Engine (in theory). I don't think this does sparse matrices, though.

chadbrewbaker commented 3 years ago

On the 16 core Neural Engine for A14/M1. I was under the impression that it was a matrix matrix multiply in hardware. Not sure how/if the A14/M1 shares L1/L2 cache with the neural engine cores. It's LLVM all the way down so you should be able to see the neural engine code LLVM IR and assembler? I have a M1 mac air on order to noodle with.

hollance commented 3 years ago

There is no API to use the Neural Engine directly. I'm sure the ANE is pretty much one big hardware accelerated matrix multiplication implementation, but the only way to use it is to make a neural network that does this kind of operation.

chadbrewbaker commented 3 years ago

You can snoop it with dtrace. It's just an OS call.

hollance commented 3 years ago

Let us know what you find 😄