Open xflowXen opened 4 years ago
This is a short summary of discussions and links of the deep learning group and stuff mentioned in the issues.
The following explanation follows the deep
crate, created by the deep learning group in Rust. They are trying to find basic building blocks for neural network learning.
A (deep) neural network is a composition of multiple non-linear functions. As such two main components are required for training NN. A computational graph defines how these components are connected, together with a Tensor to define common functions. This computational graph is connected to a Backend where the actual primitive computational blocks are defined (BLAS operations etc.). In this case this happens with the Backend
trait, defined here and implemented here. The backend is currently a simple bridge to ndarray
, for dynamically sized arrays. Also see mli for a fleshed out approach.
For static sized arrays we need const generics, currently unstable in Rust. Furthermore generic associated types are needed to define generic traits for different backend. f16
datatypes are used in many cases (for example for audio processing) An interesting approach to named tensors can be found here. This is based on the tch-rs
crate.
Another attempt of AD and computation graph is the autograd crate, currently for CPU only. Also this and this. For more information on AD this paper offers a survey. This blog post introduces reversed-mode AD in Rust.
If we move further away from NN and to numeric optimization in general, there is the argmin crate, which implements numerous optimization algorithms. With first order information like CG and particle swarm or approximated Hessian algorithms like BFGS. Perhaps we can reuse their infrastructure for training.
https://github.com/rust-ndarray/ndarray (cpu-only and dense) https://github.com/rustsim/nalgebra (cpu-only, dense and sparse) https://github.com/vbarrielle/sprs (cpu-only and sparse) https://github.com/arrayfire/arrayfire-rust (cpu/gpu, dense and sparse)
https://github.com/koute/sarek https://github.com/jramapuram/hal (not updated since 2016) https://github.com/jackm321/RustNN (not updated since 2015)
https://github.com/LaurentMazare/tch-rs https://github.com/tensorflow/rust https://github.com/apache/incubator-tvm/tree/master/rust
Awesome summary, it looks like there are quite a few points that I had missed. In particular the arrayFire arrays (which looks like it would be a good candidate for a deployment directly on machine), though having the option to use a WebGPU type interface would be good as well.
Also it looks like the thinking behind the deep-rust crate is in quite an advanced stage as well so that may be the right place to contribute to - I've followed up and reached out to them on Discord.
Its also probably better to implement the automatic differentiation optimisations now as well given its in TensorFlow 2.0 - came across this paper as well for anyone now picking up the topic (like me :) )
Edit April 10:
Had to revisit calculus and the chain rule but automatic differentiation is just an iterative form of it at tensor/neuron level. Its probably better to integrate this with creation of the graph as a back/forward propogation method.
Also had a review of this with vadix (author of the Rust Deep learn crate, who is also now working on RustCV). He has offered to contribute the mli code base to this project and also has some pretty strong views about a number of implementation points - namely:
- Dynamic Graphs instead of static graphs like MLI currently implements
- Creation of a backend concept for choosing execution target (Desktop CPU, Desktop GPU, WebGPU etc)
- SPIR-V targeting (GLSL via EMU?) for WebGPU
- H/W specific optimisation (Nice to Have) for Desktop CPU/GPU targets
- Reviewing the choice of Enum vs String API for Operations (currently implemented as Enums)
Also a good nice to have would be getting Automatic Differentiation implemented in such a way that used a Domain Specific Language on top of Rust - though this isn't a core objective of the API.
Awesome summary, it looks like there are quite a few points that I had missed. In particular the arrayFire arrays (which looks like it would be a good candidate for a deployment directly on machine), though having the option to use a WebGPU type interface would be good as well.
Yes, but be aware that
arrayfire
is actually implemented in C++ and has to be linked as a dynamic library. I think that the schism betweenndarray
andnalgebra
is hurting progress as well as that there is no library supporting all four cases of sparse/dense as well as cpu/gpu arrays. I implemented an eigensolver for large sparse matrices (https://github.com/rust-ndarray/ndarray-linalg/pull/184) and figured that it would be nice to write such an algorithm in a generic way. But there are no common traits for cholesky, eig decomposition, linear operators etc. shared between implementations. Also it looks like the thinking behind the deep-rust crate is in quite an advanced stage as well so that may be the right place to contribute to - I've followed up and reached out to them on Discord.I saw it on discord, thanks for doing that! Its also probably better to implement the automatic differentiation optimisations now as well given its in TensorFlow 2.0 - came across this paper as well for anyone now picking up the topic (like me :) )
This lecture is looking like a reduced version of the paper I linked, especially with the What Autodiff Isn't
introduction :sweat_smile:
Yeah, vadix echoed your view on the ArrayFire point - guess that means we'd need to look at having a similar Rust version for the Desktop GPU target/backend.
Just to be clear - Emu is as "pure Rust" as WebGPU is. It's just a thin abstraction over WebGPU (providing things like a global device pool, global JIT kernel cache, etc.). So Emu accepts SPIR-V as input but provides a CompileToSpirv
trait to support new languages (like GLSL).
I don't know if this would be help but I was thinking about creating a sort of AcceleratedIterator<Item = T>
trait (implemented by DeviceBox<T>
) that makes it easy to fuse together simple "map*, reduce" kernels for array operations.
Got it - so that means that EMU would allow the execution of either SPIRV or GLSL intermediate language code on WebGPU - any plans to support any other intermediate languages (SYCL, RLSL)?
Will reach out on Discord to discuss the AcceleratedIterator in more detail.
any plans to support any other intermediate languages (SYCL, RLSL)?
Personally, I don't think SYCL or RLSL are actually very useful as abstractions. I feel that something like Halide (separating algorithm from schedule), Taichi (separating algorithm from data structure), Elevate (separating algorithm from optimization strategy) might work better.
Of course, if SYCL and RLSL are popular then making them work with Emu would be a good idea.
@xflowXen One small correction to the above list, now that I am taking a look. The current Op system is implemented in terms of enums, but I would really like to consider using strings to allow downstream crates to more easily add new ops. We cannot achieve our goals with trait objects because then the graph cannot be deserialized (at least in an easy way). Strings make it possible for downstream crates to add custom ops and still allow the graph to be trivially serialized to disk.
Potential backends for Nvidia/AMD Gpus
To update the list of NN libraries we have in Rust - adding an initial one from @quietlychris here
Existing NN libraries in Rust
https://github.com/koute/sarek https://github.com/jramapuram/hal (not updated since 2016) https://github.com/jackm321/RustNN (not updated since 2015)
Following on from @LukeMathWalker 's post here, I'm starting this issue to discuss and refine the details of a high level machine learning interface (API).
From an API perspective, I propose that the interface is modelled similarly to Keras - though not necessarily with the same naming convention.
When it comes to GPUs, from what I've seen there is some vendor segmentation with the ML space - for example, Tensorflow only works on CUDA enabled video cards (see here ). I'd be keen to focus on industry standard and cross platform solutions for neural networks - specifically the standards defined by the Khronos group - though obviously that would require some community discussion to determine the right path here.
I'm also keen to understand what use cases for neural-net building others have at the minute and assist the community in moving this forwards.
What should we build?