BStudent commented 6 years ago

Overview: The Current State

The ubuntu cuda 9.1 deb packages appear to be incomplete - missing libraries, headers, tools, and samples. On closer inspection, while some components - most critically, cuda samples - are indeed not present, most of the remaining issues are due to the fact that expected symbolic links are not being created, expected paths and environment variables are not being set. Much of this is attributable to not following nvidia's best-practices for configuration that are intended to allow for side-by-side installs of different CUDA versions. Some inspection of earlier packages (e.g. 8, 7) indicate that they are all made from the same template and subject to the same issues.
About two weeks after the 18.04 release, NVidia pushed CUDA 9.2.88, which is overall more compatible with 18.04, but only explicitly states support through release 17.10.
I tested CUDA 9.2, installed from nvidia deb packages, with an ubuntu/ppa-packaged nvidia graphics driver for 396.26 that was temporarily available in a maintainer's ppa. I have now set up a mirror of that installer in a separate ppa, detailed in a section below, so that others can reproduce it.
The 396.26 install and subsequent CUDA install each required some minor tweaking by hand in synaptic, but once installed had excellent stability and performance. In particular, the extensive "CUDA samples" Makefile, which is a 10-15 minute build on and Intel i9 X-class processor with 128GB memory, ran to completion without error, and the compute-intensive examples performed well.
It is notable that CUDA-based developers use the installation and make of these sample files as a de facto regression test for correct installation of CUDA and its interaction with other subsystems such as OpenGL.
Moreover, these sample files are not included in the debian-based CUDA packages. They can be downloaded separately from other CUDA code, installed, and built individually or as a group via make: I downloaded the 9.1 version of CUDA samples sources after installing the 9.1 debian-based packages and found that the install process itself identified many apparently missing libraries and misconfigured paths. Running the makefile for CUDA samples against the debian-based CUDA install identified further misconfiguration, and ultimately crashed before completion.
Again, this is the de facto regression test used by CUDA devs. It's good to have one.

mmstick commented 6 years ago

Good news! I now finally have a TF package which can be used to build C, C++, and Python projects out of the box. I may start the process of having our build server attempt to build and push the package to our proprietary repo. If I don't do it today, I'll try it over Sat / Sun.

Some notes:

Unsure if I should make the PYTHONPATH for TF global for the entire environment on install, or just have those who install it set that when they're building a TF project
C++ likewise only requires that you specify where the TensorFlowCC config is for CMake.
Best of all, no Bazel is required for anything!

C++ API

This is the CMakeLists.txt that shows how to set up a C++ TF project.

cmake_minimum_required(VERSION 3.3 FATAL_ERROR)
list(APPEND CMAKE_PREFIX_PATH "/usr/lib/tensorflow/lib/cmake")

find_package(TensorflowCC COMPONENTS Shared)
find_package(CUDA)

add_executable(example example.cpp)

target_link_libraries(example TensorflowCC::Shared)

if(CUDA_FOUND)
  target_link_libraries(example ${CUDA_LIBRARIES})
endif()

The list(APPEND CMAKE_PREFIX_PATH "/usr/lib/tensorflow/lib/cmake") is the critical part which points to where the CMake files for TensorFlowCC can be located, and that points to the locations of the header files & lib.

I'm able to get it to compile this example.cpp file, which seems like a sufficient usage of various parts of the lib?

#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"

int main() {
  using namespace tensorflow;
  using namespace tensorflow::ops;
  Scope root = Scope::NewRootScope();
  // Matrix A = [3 2; -1 0]
  auto A = Const(root, { {3.f, 2.f}, {-1.f, 0.f} });
  // Vector b = [3 5]
  auto b = Const(root, { {3.f, 5.f} });
  // v = Ab^T
  auto v = MatMul(root.WithOpName("v"), A, b, MatMul::TransposeB(true));
  std::vector<Tensor> outputs;
  ClientSession session(root);
  // Run and fetch v
  TF_CHECK_OK(session.Run({v}, &outputs));
  // Expect outputs[0] == [19; -3]
  LOG(INFO) << outputs[0].matrix<float>();
  return 0;
}

mmstick commented 6 years ago

Python API

Given the following example:

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

You may build a project with Python API by setting your PYTHONPATH when building it, like so:

env PYTHONPATH=/usr/lib/tensorflow/lib/python3.6:$PYTHONPATH \
    python3 example.py

C API

Given the following example.c file:

#include <stdio.h>
#include <tensorflow/c/c_api.h>

int main() {
    printf("TF version: %s\n", TF_Version());
    return 0;
}

You may compile it with the following:

export LD_LIBRARY_PATH="/usr/lib/tensorflow/lib:$LD_LIBRARY_PATH"
gcc -I/usr/lib/tensorflow/include/ \
    -L/usr/lib/tensorflow/lib example.c -ltensorflow \
    -o example
./example

BStudent commented 6 years ago

I think if that C++ snippet runs, then the system has been properly built and installed: it is actually building and running a graph. Also, by default when tf.session() is invoked on a cuda-enabled machine, you should see a diagnostic that enumerates all the cuda gpus with relevant stats like their available memory and their connection status wrt the bus and each other. I'm away and not set up to check it until Monday morning, but the python version of the C++ example should be line-for-line functionally identical. As a stub, this linear regression example from "Hands on ML With Tensorflow" by A. Geron, Chapter 9, Up and Running should suffice:

# Example borrowed from A. Geron:
# https://github.com/ageron/handson-ml/blob/master/09_up_and_running_with_tensorflow.ipynb

import tensorflow as tf
import numpy as np
from sklearn.datasets import fetch_california_housing

# Run with the same pseudorandom initial values, for test consistency:
def reset_graph(seed=42):    
    tf.reset_default_graph()
    tf.set_random_seed(seed)

    np.random.seed(seed)

reset_graph()

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

print("regression intercept and coefficients:")     
print(theta_value)

The C code should be similar, but I don't want to make hollow conjectures on how C syntax wraps the API without some review.

Addendum Here's what the output for the python code looks like running from a cold start using the anaconda.org / anaconda build of tensorflow-gpu 1.8 (note the system messages that precede program otutput):

>>> with tf.Session() as sess:
...     theta_value = theta.eval()
...
2018-07-28 23:24:25.634369: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-07-28 23:24:25.887624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:05:00.0
totalMemory: 10.92GiB freeMemory: 9.88GiB
2018-07-28 23:24:26.032934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:06:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-07-28 23:24:26.192015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 2 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:09:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-07-28 23:24:26.347489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 3 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:0a:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-07-28 23:24:26.353985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1, 2, 3
2018-07-28 23:24:27.506531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-28 23:24:27.506568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 1 2 3
2018-07-28 23:24:27.506578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N Y Y Y
2018-07-28 23:24:27.506585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1:   Y N Y Y
2018-07-28 23:24:27.506592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 2:   Y Y N Y
2018-07-28 23:24:27.506599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 3:   Y Y Y N
2018-07-28 23:24:27.507623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9557 MB memory) -> physical GPU (device: 0,name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
2018-07-28 23:24:27.652616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10411 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0, compute capability: 6.1)
2018-07-28 23:24:27.810685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10411 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1)
2018-07-28 23:24:27.967955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10411 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:0a:00.0, compute capability: 6.1)
>>> print("regression intercept and coefficients:")
regression intercept and coefficients:
>>> print(theta_value)
[[-3.7465141e+01]
 [ 4.3573415e-01]
 [ 9.3382923e-03]
 [-1.0662201e-01]
 [ 6.4410698e-01]
 [-4.2513184e-06]
 [-3.7732250e-03]
 [-4.2664889e-01]
 [-4.4051403e-01]]
>>>

mmstick commented 6 years ago

Going to close this now that we have packaging. There's also soon going to be system76-cuda-latest, tensorflow-cuda-latest, and tensorflow-cpu-latest metapackages, as well as a tensorflow-1.9-cpu package. Will also look into packaging other frameworks, such as PyTorch.

BStudent commented 6 years ago

Congratulations! This is a big deal.

mmstick commented 6 years ago

The repo is about to be updated within the hour.

BStudent commented 6 years ago

From a clean install of PopOS, straight to yo desktop (if you follow the instructions)

system76 / docs

PopOS 18.04: CUDA Toolkit 9.2 needs packaging; Should use "CUDA samples" Installation and Make as regression tests; Install Instructions for 9.1 and 9.2 on 18.04 are included; Evidence shows alternate CUDA packaging would be easier for all. #84

Overview: The Current State

C++ API

Python API

C API