Adding linear algebra and other array operations

ml-explore / mlx

MLX: An array framework for Apple silicon

https://ml-explore.github.io/mlx/

MIT License

15.01k stars 856 forks source link

Adding linear algebra and other array operations #24

Open aarmey opened 5 months ago

aarmey commented 5 months ago

It looks like this is still missing many matrix operations like QR, SVD, einsum, etc. Is there a clear path to using these with or without MLX?

This has been a similar issue with the PyTorch MPS backend. While there is a long tail of these operations to support, they are essential to many machine learning models. As can be seen in the PyTorch issue, not including them limits the utility of packages like this.

Datamance commented 5 months ago

Huge +1 to this. Would be amazing to not have to drop back to numpy/CPU for these sorts of things.

aymuos15 commented 5 months ago

Hi! I am quite interested to work on this but not really sure how to start. Would someone be able to push me in the right direction?

I would be even open to have a short meeting if required.

I work from a M2 Max. Thank you :)

@awni

nullhook commented 5 months ago

matrix factorizations aren't easy parallelizable on the gpu.

would QR and SVD only have cpu implementation for now? @awni

awni commented 5 months ago

We would love to have these operations available directly in MLX. It's not our top top priority but something we intend to add in the future or even better accept contributions for.

If you are interested in contributing, here are some thoughts:

To the extent that we can avoid writing these from scratch that is good.
For the CPU we can use LAPACK and/or Accelerate depending on what's available in each. A good starting point would be to wrap an op from one of those just for the CPU (and throw for the GPU).
On the GPU there are also some pre-written kernels we can use from MPS for example: (cholesky)[https://developer.apple.com/documentation/metalperformanceshaders/mpsmatrixdecompositioncholesky?language=objc]. You can see an example of how to wrap MPS matmul. The others could be done similarly.
For ops not supported by MPS, we'd need kernels which is a bigger project, but a fun one for those up for a challenge!

j-csc commented 5 months ago

Thoughts on wrapping these linalg specific functions to a separate module on Python frontend?

awni commented 5 months ago

So you can look at how mlx.core.random works. We could do something similar for mlx.core.linalg. Basically it's a nested namespace on the C++ side mlx::core::random and then we make it a submodule in the pybind11 bindings. Then you can do:

import mlx.core as mx
mx.linalg.< >

gboduljak commented 5 months ago

Any thoughts on implementing at least vector/matrix norm methods such as torch.linalg.vector_norm?

awni commented 5 months ago

Something like np.linalg.norm for vectors and for a matrix Frobenius norm should be very easy to do.. that's also a good place to start just to get the packaging setup.

nullhook commented 5 months ago

note to self: almost all LAPACK routines are col-major

@awni would Transpose on an mlx array before sending it to LAPACK routines work here, or is there an alternative way?

awni commented 5 months ago

No I wouldn't deal with that using a transpose. You can usually call the routine with the right arguments and avoid a transpose. For example a row-major [M, N] matrix is the same as a col major [N, M] matrix in terms of its memory layout.

rickypang0219 commented 5 months ago

Hi @awni, may I ask is there any learning resources of Apple Metal and Accelerate Framework? I want to contribute to LinAlg module but I do not know where to start with. For instance, if I want to build mx.linalg.eig , how can I use LAPACK from apple accelerate framework?

ivanfioravanti commented 3 months ago

matrix factorizations aren't easy parallelizable on the gpu.

would QR and SVD only have cpu implementation for now? @awni

SVD support would be great.

awni commented 3 months ago

The CPU versions of these are pretty doable. See the QR factorization as an example https://github.com/ml-explore/mlx/blob/main/mlx/backend/common/qrf.cpp

GPU support is more involved as I don’t think there are many open source Metal implementations