rocker-org / ml

experimental machine learning container
GNU General Public License v2.0
50 stars 13 forks source link

gpu Docker image #10

Closed MarkEdmondson1234 closed 2 years ago

MarkEdmondson1234 commented 5 years ago

It works in build and I can log in, but when I try to use keras for the toy example I get:

> library(keras)
> 
> mnist <- dataset_mnist()
ImportError: No module named keras
Use the install_keras() function to install the core Keras library
Error: Error loading Python module keras
MarkEdmondson1234 commented 5 years ago

Other info:

I launched via:

nvidia-docker run -d -p 80:8787 -e USER=gpu -e PASSWORD=gpu --name gpu2 rocker/gpu

(I have port 80 open for testing)

Log in ok to RStudio with gpu/gpu

I go to the terminal and can see the GPU is live:

gpu@5ec17f23ba93:~$ nvidia-smi
Tue Feb 12 23:36:21 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P0    23W /  75W |      0MiB /  7611MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
gpu@5ec17f23ba93:~$

I tried to install_keras(tensorflow="gpu") again, it installed ok but when trying to run example I got:

> library(keras)
> 
> mnist <- dataset_mnist()
Using TensorFlow backend.
Error: ImportError: Traceback (most recent call last):
  File "/home/gpu/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/home/gpu/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/home/gpu/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error 
MarkEdmondson1234 commented 5 years ago

From what I remember from the R tensorflow install docs, I think you need CUDA version 9.0, not 9.2?

## CUDA Version
ENV CUDA_MAJOR_VERSION=9.2
ENV CUDA_MAJOR_VERSION_HYP=9.2
ENV CUDA_MINOR_VERSION=9.2.148-1
ENV NVIDIA_REQUIRE_CUDA="cuda>=9.2"
MarkEdmondson1234 commented 5 years ago

Yep from here: https://tensorflow.rstudio.com/tools/local_gpu.html

Note that it’s important to download cuDNN v7.0 for CUDA 9.0 (rather than CUDA 9.1 or 9.2, which may be the choice initially presented) as v7.0 is what TensorFlow is built against.

cboettig commented 5 years ago

@MarkEdmondson1234 thanks, yeah, I was puzzling just over this too, though I believe @noamross has TF working with 9.2. I believe this can either be addressed with getting the right symlinks or possibly by getting pip to install the right tensorflow libs (i.e. those compiled against 9.2)? or maybe I'm wrong.

@seabbs may have looked at this as well.

cboettig commented 5 years ago

Does look like circa Sept 2018 at least pip-based tensorflow-gpu was only built against 9.0. not sure if that's still the case but it seems so. Conda gives some suggestion that it's version supports 9.2, https://docs.anaconda.com/anaconda/user-guide/tasks/gpu-packages/ ? And there's some recommendations for building tensorflow from source. https://www.pytorials.com/how-to-install-tensorflow-gpu-with-cuda-9-2-for-python-on-ubuntu/

MarkEdmondson1234 commented 5 years ago

I tried to change the CUDA environment args to 9.0 which built but did not execute at runtime, so I guess more to do:

## CUDA Version
ENV CUDA_MAJOR_VERSION=9.0
ENV CUDA_MAJOR_VERSION_HYP=9.0
ENV CUDA_MINOR_VERSION=9.0.176-1
ENV NVIDIA_REQUIRE_CUDA="cuda==9.0"
nvidia-docker run -d -p 80:8787 -e USER=gpu -e PASSWORD=gpu --name gpu4 gcr.io/gcer-public/gpu-r:0e74959 
66c303d29bfe0d69f8cc3b259ed8bbb221c99fe4897f195c76998a1a68d4bd34
docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux
.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , 
stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig --device=all --compute --utility
 --require=cuda==9.0 --pid=29606 /var/lib/docker/overlay2/56cfb8c992d62cdd223cc74ebd56b82e30e5043248e90660d7411058ca9c1d01/merged]\\\\n
nvidia-container-cli: requirement error: invalid expression\\\\n\\\"\"": unknown.
cboettig commented 5 years ago

@MarkEdmondson1234 so it does look like building the different cuda versions will be necessary, or at least convenient, here.

I've taken the recipes for the official nvidia/cuda stack and overlaid them on the rocker images here: https://github.com/rocker-org/ml/tree/master/cuda

I've then put the machine learning side of things on top of this as a separate file here https://github.com/rocker-org/ml/tree/master/ml (currently only for 9.0, but I noticed the tensorflow tf-nightly-gpu are now built against cuda 10.0, so hope to support that soon too).

So rocker/ml (when it builds later tonight) is now my working candidate for a gpu-enabled image. clearly things are still in a bit in flux here, but I think moving in a good direction at least, more feedback and testing always welcome.

at least in my test, I'm able to build the current, rocker/cuda:9.0 based rocker/ml image and run the mnist example on my Nvidia GPU machine.

MarkEdmondson1234 commented 5 years ago

This will be awesome, thanks. My motivation is to be able to work through the Deep Learning with R book using a GCP deeplearning GPU VM - hope to add a template to googleComputeEngineR so folks can get started via:

library(googleComputeEngineR) # assume auto-auth, project settings etc

vm <- gce_vm(template = "gpu-ml-rstudio",
             name = "deeplearning-ml",
             username = "gpu", password = "gpu")
MarkEdmondson1234 commented 5 years ago

I tried it again this morning with the new image, and I think it worked :D

nvidia-docker run -d -p 80:8787 -e USER=gpu -e PASSWORD=gpu --name ml rocker/ml
> library(keras)
> mnist <- dataset_mnist()
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
11493376/11490434 [==============================] - 1s 0us/step
> train_images <- mnist$train$x
> train_labels <- mnist$train$y
> test_images <- mnist$test$x
> test_labels <- mnist$test$y
> 
> network <- keras_model_sequential() %>% 
+     layer_dense(units = 512, activation = "relu", input_shape = c(28*28)) %>% 
+     layer_dense(units = 10, activation = "softmax")
> 
> network %>% compile(
+     optimizer = "rmsprop",
+     loss = "categorical_crossentropy",
+     metrics = c("accuracy")
+ )
> 
> train_images <- array_reshape(train_images, c(60000, 28*28))
> train_images <- train_images / 255
> 
> test_images <- array_reshape(test_images, c(10000, 28*28))
> test_images <- test_images / 255
> 
> train_labels <- to_categorical(train_labels)
> test_labels <- to_categorical(test_labels)
> 
> network %>% fit(train_images, train_labels, epochs = 5, batch_size = 128)
2019-02-13 13:50:36.047596: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-02-13 13:50:36.905061: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-13 13:50:36.905392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135
pciBusID: 0000:00:04.0
totalMemory: 7.43GiB freeMemory: 7.31GiB
2019-02-13 13:50:36.905424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-13 13:50:37.288155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-13 13:50:37.288227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-02-13 13:50:37.288238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-02-13 13:50:37.288482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7055 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:00:04.0, compute capability: 6.1)
Epoch 1/5
60000/60000 [==============================] - 5s 78us/step - loss: 0.2563 - acc: 0.9259
Epoch 2/5
60000/60000 [==============================] - 2s 32us/step - loss: 0.1041 - acc: 0.9692
Epoch 3/5
60000/60000 [==============================] - 2s 31us/step - loss: 0.0679 - acc: 0.9795
Epoch 4/5
60000/60000 [==============================] - 2s 31us/step - loss: 0.0504 - acc: 0.9850
Epoch 5/5
60000/60000 [==============================] - 2s 31us/step - loss: 0.0381 - acc: 0.9889
> 
MarkEdmondson1234 commented 5 years ago

just a heads up I'm still money-ing around with the rocker/ml image a bit, partly so I can get a sensible tag scheme where we can support both different versions of cuda and different versions of R. There will still be a rocker/ml:latest that does something reasonable though (e.g. probably cuda 9.0 and latest R for now). thoughts on a sane way to do this are welcome

Lots of moving parts on this one, as I see it:

Whilst its nice to have xgboost/h20 in there for the future they are very heavy (40mins+ to build) and I appreciated only having Tensorflow/Keras in one image.

Also the Tensorflow version had this message about CPU

2019-02-13 13:50:36.047596: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA`

Perhaps safe to ignore but did tempt me to rebuild supporting the CPU features as well, at the very least its going to trigger questions.

But for me, the most flexible but with sensible default would be:

In all that, I don't think R versions will be most important unless its really bleeding edge as in my experience most R packages still work on updating R versions, whereas TF/Python breaks quickly so I would be inspecting those versions more closely.

As a suggestion then:

Then we add plumber ourselves, not too much hassle.

cboettig commented 5 years ago

@MarkEdmondson1234 thanks much for this, it is a huge help to bounce ideas off you here.

I like the three levels you outline here. For the base image names, I'm tempted to simplify them to: rocker/cuda, rocker/tf, and rocker/ml, even if on the GitHub side all the Dockerfiles live in rocker-org/ml.

For python version, I believe I have everything at python3 and python2.7 is not even installed on the system.

Re the tensorflow CPU message, I see that too, I believe that's up to the team that builds the tensorflow pip package / wheel thingy. (I looked at installing TF from source when we were trying to build on CUDA 9.2. and it looks hairy, particularly because the build system is super-interactive and so hard to see how to automate). Let's just assume they know what they are doing.

For the tags, I'm still struggling. I like the notion of gpu/cpu tags, I agree that access to CPU machine-learning stack that can bypass the NVIDIA stuff and be much lighter is a big win.

With regards to version-type tags, the whole rest of the R versioned stack pins versions by R's release dates -- e.g. you get the version of pandoc, RStudio, and all R packages that were current when said R version was last current. We've tried to promote the notion that a user can do something like rocker/verse:3.5.1 or rocker/binder:3.5.1 and know that stuff that worked on that image once will have a very very high probability of still working a year from now. It's not obvious how this promise translates to something like rocker/tf:1.12.0-gpu. I suppose that always means R 3.5.2? Also, is locking the tensorflow version like this implicit promise of locking the cuda version (i.e. tensorflow 1.12.0 also means CUDA 9.0?)

For the base rocker/ml image, I think it would still be easier to just have it include h2o, keras, and xgboost out of the box. In terms of space, all of these images would be much smaller if I get xgboost multi-GPU support to install from pip instead of from source, then I can drop some 2 GB of NVIDIA devel libs. I think adding too many options there makes it harder for new users to know where to start and is more for us to maintain, and an experienced user will always be able to build a smaller image custom fit to their own needs than we will ever be able to provide. The version issue raises it's head here too -- how do I specify which version of tensorflow this is getting? Or to put it another way -- a year from now, what tag will I use to reuild the rocker/ml image with the environment that rocker/ml is currently creating today?

Thanks again, really like your ideas on this and appreciate your feedback!

noamross commented 5 years ago

I think we should stick with tags being r-version numbers, though of course only a limited set of R more recent R versions. Then we have a single cuda/tensorflow stack that is the latest that can be built against hardware compatible with major-version cuda, e.g.,

Practically, the keras R package version available at a given R release date will determine which R versions will be available in this stack. So, rocker/tf-gpu9:3.5.1 Will have the version of keras last available for 3.5.1, and the last version of tensorflow that Keras package supported. There will not be a rocker/tf-gpu10:3.5.1, because as of the end of 3.5.1 there wasn't a cuda-10 version of Tensorflow that was supported by the R Keras package, etc.

We can of course do some documentation on how to customize your stack.

noamross commented 5 years ago

all of these images would be much smaller if I get xgboost multi-GPU support to install from pip instead of from source

Could we try some multi-stage built magic here? Though having those libs might be good for users who want to install other software from source, like mxnet.

cboettig commented 5 years ago

Thanks @noamross! After waffling and thinking it over, I agree about having the gpu/cpu part in the image name instead of the image tag. rocker/ml-gpu is more consistent with our existing use of version tags in this stack. In general overloading all this on tags seems more common, but at least on the package side, python is distinguishing gpu versions with this hyphen-gpu suffix in tensorflow-gpu, so that's consistent.

I am also tempted to just stick with one CUDA version per R version. Again looking to the precedent on the Python side, tensorflow-gpu versions 1.5 - 1.12 (current) are all on CUDA 9; CUDA 10 is only available in the nightlies (i.e. we could put it in on our devel tag). tensorflow-gpu v1.0 to 1.4 were CUDA 8, with about a year between the CUDA bumps. This also avoids having to create entirely new docker repos on hub to accommodate CUDA releases; instead we can just update the tags.

Regarding the multistage builds, yes, I think that's possible, but I also think I can already get away with a binary python wheel for xgboost and drop the cuda devel dependencies. I've now separated out those devel libs into a separate Dockerfile and build tensorflow directly on cuda-base.

So, the current directory structure looks like:

├── cuda
│   ├── base
│   │   └── Dockerfile
│   └── devel
│       └── Dockerfile
├── ml
│   ├── cpu
│   │   └── Dockerfile
│   └── gpu
│       └── Dockerfile
└── tf
    ├── cpu
    │   └── Dockerfile
    └── gpu
        └── Dockerfile
├── README.md
├── LICENSE
├── Makefile

Also, how do folks feel about going with rocker/tensorflow instead of rocker/tf ?

So, my new proposed image stack would be

with tags devel, latest == 3.5.2 on all images.

latest/3.5.2 would be CUDA 9, devel would use CUDA 10 (if I can even get CUDA 10 to build on debian stretch instead of ubuntu 18.04...)

noamross commented 5 years ago

I can already get away with a binary python wheel for xgboost and drop the cuda devel dependencies.

Does the xgboost R package work with the GPU this way? I'll test but I don't think so.

noamross commented 5 years ago

rocker/tensorflow works for me

cboettig commented 5 years ago

Just following up that the instances as described above should all be built now.

Note that on the rocker/ml-gpu instance, xgboost is now built using multistage builds to pull in cuda dev libs. Images can be requested using either the tags latest or 3.5.2 (which give the identical image). Images need more widespread testing though! thanks!

MarkEdmondson1234 commented 5 years ago

Looks great, will start putting them through their paces.

MarkEdmondson1234 commented 5 years ago

I've got a template together, so hopefully some other folks will start testing the images as well https://cloudyr.github.io/googleComputeEngineR/articles/gpu.html

cboettig commented 5 years ago

@MarkEdmondson1234 nice, thanks! I need to update the READMEs in this repo to give some better documentation on getting started (and better acknowledge the contributions from you, @noamross and others!), so reminding myself to link to that as well. thanks!

MarkEdmondson1234 commented 5 years ago

Another note for any documentation, but having this image now means you can train R models on GPU accelerated instances serverless, via Cloud ML which is super https://cloud.google.com/ml-engine/docs/using-containers - the demo at the end of this video shows using R in containers to train and test https://www.youtube.com/watch?v=XpNVixSN-Mg&feature=youtu.be and the repo with code is here https://github.com/gmikels/google-cloud-R-examples

cboettig commented 5 years ago

Thanks @MarkEdmondson1234 , that's really cool!

eitsupi commented 2 years ago

I think this issue has been resolved, so I will close it.