[neural-net-api] What do we want to build?

xflowXen commented 4 years ago

Following on from @LukeMathWalker 's post here, I'm starting this issue to discuss and refine the details of a high level machine learning interface (API).

From an API perspective, I propose that the interface is modelled similarly to Keras - though not necessarily with the same naming convention.
When it comes to GPUs, from what I've seen there is some vendor segmentation with the ML space - for example, Tensorflow only works on CUDA enabled video cards (see here ). I'd be keen to focus on industry standard and cross platform solutions for neural networks - specifically the standards defined by the Khronos group - though obviously that would require some community discussion to determine the right path here.

Edit 6Apr2020: Further to the above it looks like there is already some progress on creating a cross platform, web based method for accessing the GPU:

GFRS WebGPU - Native Rust (over Rust/C bindings for > DirectX/Vulkan/Metal) implementation of the W3C WebGPU specification and W3C WebGPU Shader Language

EMU - A (soon to be) pure Rust based GPGPU abstraction over WebGPU for running SPIR-V compute kernels.

I'm also keen to understand what use cases for neural-net building others have at the minute and assist the community in moving this forwards.

What should we build?

bytesnake commented 4 years ago

This is a short summary of discussions and links of the deep learning group and stuff mentioned in the issues.

Basic building blocks of neural networks

The following explanation follows the deep crate, created by the deep learning group in Rust. They are trying to find basic building blocks for neural network learning.

A (deep) neural network is a composition of multiple non-linear functions. As such two main components are required for training NN. A computational graph defines how these components are connected, together with a Tensor to define common functions. This computational graph is connected to a Backend where the actual primitive computational blocks are defined (BLAS operations etc.). In this case this happens with the Backend trait, defined here and implemented here. The backend is currently a simple bridge to ndarray, for dynamically sized arrays. Also see mli for a fleshed out approach.

For static sized arrays we need const generics, currently unstable in Rust. Furthermore generic associated types are needed to define generic traits for different backend. f16 datatypes are used in many cases (for example for audio processing) An interesting approach to named tensors can be found here. This is based on the tch-rs crate.

Other attempts of automatic differentiation

Another attempt of AD and computation graph is the autograd crate, currently for CPU only. Also this and this. For more information on AD this paper offers a survey. This blog post introduces reversed-mode AD in Rust.

Optimization algorithms

If we move further away from NN and to numeric optimization in general, there is the argmin crate, which implements numerous optimization algorithms. With first order information like CG and particle swarm or approximated Hessian algorithms like BFGS. Perhaps we can reuse their infrastructure for training.

Array backends for Rust

https://github.com/rust-ndarray/ndarray (cpu-only and dense) https://github.com/rustsim/nalgebra (cpu-only, dense and sparse) https://github.com/vbarrielle/sprs (cpu-only and sparse) https://github.com/arrayfire/arrayfire-rust (cpu/gpu, dense and sparse)

Existing NN libraries in Rust

https://github.com/koute/sarek https://github.com/jramapuram/hal (not updated since 2016) https://github.com/jackm321/RustNN (not updated since 2015)

Existing bindings to other languages

https://github.com/LaurentMazare/tch-rs https://github.com/tensorflow/rust https://github.com/apache/incubator-tvm/tree/master/rust

xflowXen commented 4 years ago

Awesome summary, it looks like there are quite a few points that I had missed. In particular the arrayFire arrays (which looks like it would be a good candidate for a deployment directly on machine), though having the option to use a WebGPU type interface would be good as well.

Also it looks like the thinking behind the deep-rust crate is in quite an advanced stage as well so that may be the right place to contribute to - I've followed up and reached out to them on Discord.

Its also probably better to implement the automatic differentiation optimisations now as well given its in TensorFlow 2.0 - came across this paper as well for anyone now picking up the topic (like me :) )

Edit April 10:

Had to revisit calculus and the chain rule but automatic differentiation is just an iterative form of it at tensor/neuron level. Its probably better to integrate this with creation of the graph as a back/forward propogation method.

Also had a review of this with vadix (author of the Rust Deep learn crate, who is also now working on RustCV). He has offered to contribute the mli code base to this project and also has some pretty strong views about a number of implementation points - namely:

Dynamic Graphs instead of static graphs like MLI currently implements

Creation of a backend concept for choosing execution target (Desktop CPU, Desktop GPU, WebGPU etc)

SPIR-V targeting (GLSL via EMU?) for WebGPU

H/W specific optimisation (Nice to Have) for Desktop CPU/GPU targets

Reviewing the choice of Enum vs String API for Operations (currently implemented as Enums)

Also a good nice to have would be getting Automatic Differentiation implemented in such a way that used a Domain Specific Language on top of Rust - though this isn't a core objective of the API.

bytesnake commented 4 years ago

Awesome summary, it looks like there are quite a few points that I had missed. In particular the arrayFire arrays (which looks like it would be a good candidate for a deployment directly on machine), though having the option to use a WebGPU type interface would be good as well.

Yes, but be aware that arrayfire is actually implemented in C++ and has to be linked as a dynamic library. I think that the schism between ndarray and nalgebra is hurting progress as well as that there is no library supporting all four cases of sparse/dense as well as cpu/gpu arrays. I implemented an eigensolver for large sparse matrices (https://github.com/rust-ndarray/ndarray-linalg/pull/184) and figured that it would be nice to write such an algorithm in a generic way. But there are no common traits for cholesky, eig decomposition, linear operators etc. shared between implementations. Also it looks like the thinking behind the deep-rust crate is in quite an advanced stage as well so that may be the right place to contribute to - I've followed up and reached out to them on Discord.

I saw it on discord, thanks for doing that! Its also probably better to implement the automatic differentiation optimisations now as well given its in TensorFlow 2.0 - came across this paper as well for anyone now picking up the topic (like me :) )

This lecture is looking like a reduced version of the paper I linked, especially with the What Autodiff Isn't introduction :sweat_smile:

xflowXen commented 4 years ago

Yeah, vadix echoed your view on the ArrayFire point - guess that means we'd need to look at having a similar Rust version for the Desktop GPU target/backend.

calebwin commented 4 years ago

Just to be clear - Emu is as "pure Rust" as WebGPU is. It's just a thin abstraction over WebGPU (providing things like a global device pool, global JIT kernel cache, etc.). So Emu accepts SPIR-V as input but provides a CompileToSpirv trait to support new languages (like GLSL).

I don't know if this would be help but I was thinking about creating a sort of AcceleratedIterator<Item = T> trait (implemented by DeviceBox<T>) that makes it easy to fuse together simple "map*, reduce" kernels for array operations.

xflowXen commented 4 years ago

Got it - so that means that EMU would allow the execution of either SPIRV or GLSL intermediate language code on WebGPU - any plans to support any other intermediate languages (SYCL, RLSL)?

Will reach out on Discord to discuss the AcceleratedIterator in more detail.

calebwin commented 4 years ago

any plans to support any other intermediate languages (SYCL, RLSL)?

Personally, I don't think SYCL or RLSL are actually very useful as abstractions. I feel that something like Halide (separating algorithm from schedule), Taichi (separating algorithm from data structure), Elevate (separating algorithm from optimization strategy) might work better.

Of course, if SYCL and RLSL are popular then making them work with Emu would be a good idea.

vadixidav commented 4 years ago

@xflowXen One small correction to the above list, now that I am taking a look. The current Op system is implemented in terms of enums, but I would really like to consider using strings to allow downstream crates to more easily add new ops. We cannot achieve our goals with trait objects because then the graph cannot be deserialized (at least in an easy way). Strings make it possible for downstream crates to add custom ops and still allow the graph to be trivially serialized to disk.

xflowXen commented 4 years ago

Potential backends for Nvidia/AMD Gpus

https://crates.io/crates/accel https://crates.io/crates/ocl

xflowXen commented 4 years ago

To update the list of NN libraries we have in Rust - adding an initial one from @quietlychris here

Existing NN libraries in Rust

https://github.com/koute/sarek https://github.com/jramapuram/hal (not updated since 2016) https://github.com/jackm321/RustNN (not updated since 2015)

https://github.com/quietlychris/tsuga

rust-ml / discussion