mratsim / Arraymancer

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
https://mratsim.github.io/Arraymancer/
Apache License 2.0
1.33k stars 96 forks source link

OpenCL #69

Closed Jipok closed 6 years ago

Jipok commented 6 years ago

You seem to use cuda, but this is a nvidia-lock solution. What do the owners of AMD videocards do? Would not it be better to use an OpenCL? I do not understand well in this area, but in the future I would like to buy a new AMD card and work on machine learning.

mratsim commented 6 years ago

This is indeed an unfortunate situation.

It came from history, Nvidia invested heavily into GPU compute and then created custom functions (convolutions, recurrent neural networks built-ins ...) specifically for the Deep Learning community. Between 2012 and 2017 there was no equivalent anywhere else.

However, this year AMD woke up and is heavily investing in compute. They started ROCm (Radeon Open Compute) and HIP, a tool that can translate Cuda code to OpenCL and ROCm. If you check there, the API is almost the same so porting should be easy.

Last promising thing is that Tesla just recruited one of the top computer vision scientist (Andrej Karpathy) and is working with AMD on their Computer Vision for their car. improvements there will probably be ported to consumers cards and scientific libraries.

Now the only tough thing is that I don't have an AMD card (like 90+% of people in the deep learning community).

OpenCL support is planned, but features will probably come slowly due to its lack of maturity.

Jipok commented 6 years ago

Why support two technologies at once? For OpenCL, you do not need a graphics card from AMD. It can also work on the Nvidia card without changing the code. Is not it ? Or is OpenCL a less functional and less suitable? p.s. Thank you for the news about AMD. It pleases, because the domination of Nvidia and what they do, I do not like.

ghost commented 6 years ago

@Jipok I think that CUDA is faster than OpenCL when used on Nvidia cards

mratsim commented 6 years ago

It's not that OpenCL is less functional however at this current point in time it is indeed less suitable.

  1. The most important thing is having CuDNN equivalent. CuDNN is a library of deep learning primitives, Nvidia only. Coding tuned convolution, or recurrent neural network from scratch on GPU is PhD work, see here.

  2. Major libraries all uses CUDA as a first-class citizen (mostly due to point 1). Neither Google nor Facebook are supporting it. This is important because for fast development, having reference implementations is key.

  3. Both AMD and Intel are doing their own things, and they are not ready for prime-time. AMD Hip will use Cuda syntax to compile to OpenCL/Cuda/Hip. Intel started 5 months ago on clDNN equivalent to CuDNN. This is a technical preview and so a moving target.

  4. Every single person working in Deep Learning has Nvidia hardware, even people from Intel, AMD, Apple, just because it is the reference implementation to be compared with. Would be happy to be proven wrong.

  5. Resources: I don't have hardware (besides my integrated Intel GPU on a laptop) nor time for it while there are a lot of low-hanging fruits. This startup, Vertex.ai, is doing it, they are heroes and do that full-time. Besides Google and Facebook, Microsoft is saying no, Theano which was one of the leader says OpenCL is currently unusable, Mxnet (Apache/Amazon) is waiting to see what happens to AMD HIP

Lastly, there is a Khronos-led standard in the work but I don't think we will see something out in the next year.

mratsim commented 6 years ago

Another interesting library to check: VexCL C++ vector expression template lib that abstracts Cuda and OpenCL and provides:

Initialization
Managing memory
Vector expressions
Parallel primitives and algorithms
Multivectors and multiexpressions
Converting generic C++ algorithms to OpenCL/CUDA
Custom kernels
Interoperability with other libraries
Building VexCL programs with CMake
Talks and publications
Indices and tables
mratsim commented 6 years ago

My Christmas gift 🎁 https://github.com/mratsim/Arraymancer/pull/184