system76 / docs

System76 support documentation site
https://support.system76.com
305 stars 402 forks source link

PopOS 18.04: CUDA Toolkit 9.2 needs packaging; Should use "CUDA samples" Installation and Make as regression tests; Install Instructions for 9.1 and 9.2 on 18.04 are included; Evidence shows alternate CUDA packaging would be easier for all. #84

Closed BStudent closed 6 years ago

BStudent commented 6 years ago

Overview: The Current State

image

mmstick commented 6 years ago

Good news! I now finally have a TF package which can be used to build C, C++, and Python projects out of the box. I may start the process of having our build server attempt to build and push the package to our proprietary repo. If I don't do it today, I'll try it over Sat / Sun.

Some notes:

C++ API

This is the CMakeLists.txt that shows how to set up a C++ TF project.

cmake_minimum_required(VERSION 3.3 FATAL_ERROR)
list(APPEND CMAKE_PREFIX_PATH "/usr/lib/tensorflow/lib/cmake")

find_package(TensorflowCC COMPONENTS Shared)
find_package(CUDA)

add_executable(example example.cpp)

target_link_libraries(example TensorflowCC::Shared)

if(CUDA_FOUND)
  target_link_libraries(example ${CUDA_LIBRARIES})
endif()

The list(APPEND CMAKE_PREFIX_PATH "/usr/lib/tensorflow/lib/cmake") is the critical part which points to where the CMake files for TensorFlowCC can be located, and that points to the locations of the header files & lib.

I'm able to get it to compile this example.cpp file, which seems like a sufficient usage of various parts of the lib?

#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"

int main() {
  using namespace tensorflow;
  using namespace tensorflow::ops;
  Scope root = Scope::NewRootScope();
  // Matrix A = [3 2; -1 0]
  auto A = Const(root, { {3.f, 2.f}, {-1.f, 0.f} });
  // Vector b = [3 5]
  auto b = Const(root, { {3.f, 5.f} });
  // v = Ab^T
  auto v = MatMul(root.WithOpName("v"), A, b, MatMul::TransposeB(true));
  std::vector<Tensor> outputs;
  ClientSession session(root);
  // Run and fetch v
  TF_CHECK_OK(session.Run({v}, &outputs));
  // Expect outputs[0] == [19; -3]
  LOG(INFO) << outputs[0].matrix<float>();
  return 0;
}
mmstick commented 6 years ago

Python API

Given the following example:

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

You may build a project with Python API by setting your PYTHONPATH when building it, like so:

env PYTHONPATH=/usr/lib/tensorflow/lib/python3.6:$PYTHONPATH \
    python3 example.py

C API

Given the following example.c file:

#include <stdio.h>
#include <tensorflow/c/c_api.h>

int main() {
    printf("TF version: %s\n", TF_Version());
    return 0;
}

You may compile it with the following:

export LD_LIBRARY_PATH="/usr/lib/tensorflow/lib:$LD_LIBRARY_PATH"
gcc -I/usr/lib/tensorflow/include/ \
    -L/usr/lib/tensorflow/lib example.c -ltensorflow \
    -o example
./example
BStudent commented 6 years ago

I think if that C++ snippet runs, then the system has been properly built and installed: it is actually building and running a graph. Also, by default when tf.session() is invoked on a cuda-enabled machine, you should see a diagnostic that enumerates all the cuda gpus with relevant stats like their available memory and their connection status wrt the bus and each other. I'm away and not set up to check it until Monday morning, but the python version of the C++ example should be line-for-line functionally identical. As a stub, this linear regression example from "Hands on ML With Tensorflow" by A. Geron, Chapter 9, Up and Running should suffice:

# Example borrowed from A. Geron:
# https://github.com/ageron/handson-ml/blob/master/09_up_and_running_with_tensorflow.ipynb

import tensorflow as tf
import numpy as np
from sklearn.datasets import fetch_california_housing

# Run with the same pseudorandom initial values, for test consistency:
def reset_graph(seed=42):    
    tf.reset_default_graph()
    tf.set_random_seed(seed)

    np.random.seed(seed)

reset_graph()

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

print("regression intercept and coefficients:")     
print(theta_value)

The C code should be similar, but I don't want to make hollow conjectures on how C syntax wraps the API without some review.

Addendum Here's what the output for the python code looks like running from a cold start using the anaconda.org / anaconda build of tensorflow-gpu 1.8 (note the system messages that precede program otutput):

>>> with tf.Session() as sess:
...     theta_value = theta.eval()
...
2018-07-28 23:24:25.634369: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-07-28 23:24:25.887624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:05:00.0
totalMemory: 10.92GiB freeMemory: 9.88GiB
2018-07-28 23:24:26.032934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:06:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-07-28 23:24:26.192015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 2 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:09:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-07-28 23:24:26.347489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 3 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:0a:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-07-28 23:24:26.353985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1, 2, 3
2018-07-28 23:24:27.506531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-28 23:24:27.506568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 1 2 3
2018-07-28 23:24:27.506578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N Y Y Y
2018-07-28 23:24:27.506585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1:   Y N Y Y
2018-07-28 23:24:27.506592: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 2:   Y Y N Y
2018-07-28 23:24:27.506599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 3:   Y Y Y N
2018-07-28 23:24:27.507623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9557 MB memory) -> physical GPU (device: 0,name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
2018-07-28 23:24:27.652616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10411 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0, compute capability: 6.1)
2018-07-28 23:24:27.810685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10411 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1)
2018-07-28 23:24:27.967955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10411 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:0a:00.0, compute capability: 6.1)
>>> print("regression intercept and coefficients:")
regression intercept and coefficients:
>>> print(theta_value)
[[-3.7465141e+01]
 [ 4.3573415e-01]
 [ 9.3382923e-03]
 [-1.0662201e-01]
 [ 6.4410698e-01]
 [-4.2513184e-06]
 [-3.7732250e-03]
 [-4.2664889e-01]
 [-4.4051403e-01]]
>>>
mmstick commented 6 years ago

Going to close this now that we have packaging. There's also soon going to be system76-cuda-latest, tensorflow-cuda-latest, and tensorflow-cpu-latest metapackages, as well as a tensorflow-1.9-cpu package. Will also look into packaging other frameworks, such as PyTorch.

BStudent commented 6 years ago

Congratulations! This is a big deal.

mmstick commented 6 years ago

The repo is about to be updated within the hour.

BStudent commented 6 years ago

From a clean install of PopOS, straight to yo desktop (if you follow the instructions)

image