lukeiwanski / tensorflow

OpenCL support for TensorFlow via SYCL
Apache License 2.0
65 stars 14 forks source link

TensorFlow does not see all available GPUs in my system #252

Open lu4 opened 6 years ago

lu4 commented 6 years ago

System information

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

Here's info from environment capture script:

== cat /etc/issue ===============================================
Linux custom 4.16.0-rc6-smos+ #1 SMP Wed Mar 21 13:23:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
VERSION="16.04.3 LTS (Xenial Xerus)"
VERSION_ID="16.04"
VERSION_CODENAME=xenial

== are we in docker =============================================
No

== compiler =====================================================
c++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

== uname -a =====================================================
Linux custom 4.16.0-rc6-smos+ #1 SMP Wed Mar 21 13:23:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

== check pips ===================================================
numpy               1.14.5
protobuf            3.6.0
tensorflow          1.8.0rc1

== check for virtualenv =========================================
False

== tensorflow import ============================================
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named tensorflow

== env ==========================================================
LD_LIBRARY_PATH /usr/local/lib:/usr/local/computecpp/lib:/usr/local/lib:/usr/local/computecpp/lib:
DYLD_LIBRARY_PATH is unset

== nvidia-smi ===================================================
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

== cuda libs  ===================================================
/usr/local/lib/libcudart.so.9.0.103

== cat /etc/issue ===============================================
Linux custom 4.16.0-rc6-smos+ #1 SMP Wed Mar 21 13:23:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
VERSION="16.04.3 LTS (Xenial Xerus)"
VERSION_ID="16.04"
VERSION_CODENAME=xenial

== are we in docker =============================================
No

== compiler =====================================================
c++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

== uname -a =====================================================
Linux custom 4.16.0-rc6-smos+ #1 SMP Wed Mar 21 13:23:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

== check pips ===================================================
numpy               1.14.5
protobuf            3.6.0
tensorflow          1.8.0rc1

== check for virtualenv =========================================
False

== tensorflow import ============================================
tf.VERSION = 1.8.0-rc1
tf.GIT_VERSION = b'ComputeCpp-v0.6.0-4212-gb29ac8a'
tf.COMPILER_VERSION = b'ComputeCpp-v0.6.0-4212-gb29ac8a'
Sanity check: array([1], dtype=int32)

== env ==========================================================
LD_LIBRARY_PATH /usr/local/lib:/usr/local/computecpp/lib:/usr/local/lib:/usr/local/computecpp/lib:
DYLD_LIBRARY_PATH is unset

== nvidia-smi ===================================================
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

== cuda libs  ===================================================
/usr/local/lib/libcudart.so.9.0.103

Describe the problem

Tensorflow built on top of SYCL refuses to list and use all available GPUs in the system. I'm using the following commands to get list of devices:

(please note that TensorFlow's in-line log presents 8 devices, but the actual resulting variable contains just two CPU and one GPU available through "/device:SYCL:0" name)

>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2018-07-21 14:21:08.328612: I ./tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
2018-07-21 14:21:09.308907: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-07-21 14:21:09.308981: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
2018-07-21 14:21:09.309001: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 1, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
2018-07-21 14:21:09.309019: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 2, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
2018-07-21 14:21:09.309034: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 3, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
2018-07-21 14:21:09.309052: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 4, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
2018-07-21 14:21:09.309068: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 5, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
2018-07-21 14:21:09.309085: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 6, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
2018-07-21 14:21:09.309101: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 7, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 911408516298923653
, name: "/device:SYCL:0"
device_type: "SYCL"
memory_limit: 268435456
locality {
}
incarnation: 161138719697210983
physical_device_desc: "id: 0, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE"
]

I confirm that all devices are functional and available to OpenCL (visible to clinfo) and are operable through another 3-rd party package (ArrayFire). Also I confirm that SYCL itself sees all available devices, in order to test that purpose I've updated SYCL's 'custom-device-selector' example to following code:

/***************************************************************************
 *
 *  Copyright (C) 2016 Codeplay Software Limited
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 *  For your convenience, a copy of the License has been included in this
 *  repository.
 *
 *  Unless required by applicable law or agreed to in writing, software
 *  distributed under the License is distributed on an "AS IS" BASIS,
 *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *  See the License for the specific language governing permissions and
 *  limitations under the License.
 *
 *  Codeplay's ComputeCpp SDK
 *
 *  custom-device-selector.cpp
 *
 *  Description:
 *    Sample code that shows how to write a custom device selector in SYCL.
 *
 **************************************************************************/

#include <CL/sycl.hpp>
#include <iostream>

using namespace cl::sycl;
using namespace std;

/* Classes can inherit from the device_selector class to allow users
 * to dictate the criteria for choosing a device from those that might be
 * present on a system. This example looks for a device with SPIR support
 * and prefers GPUs over CPUs. */
class custom_selector : public device_selector {
 public:
  custom_selector() : device_selector() {}

  /* The selection is performed via the () operator in the base
   * selector class.This method will be called once per device in each
   * platform. Note that all platforms are evaluated whenever there is
   * a device selection. */
  int operator()(const device& device) const override {
    cout << device.get_info<cl::sycl::info::device::vendor>() << ": " << device.get_info<cl::sycl::info::device::name>() << std::endl; // << "(" << device.get_info<cl::sycl::info::device::device_type>() << ")"
    cout << '\t' << "max_work_group_size : " << device.get_info<cl::sycl::info::device::max_work_group_size>() << std::endl;
//    cout << '\t' << "max_work_item_sizes : " << device.get_info<cl::sycl::info::device::max_work_item_sizes>() << std::endl;
    cout << '\t' << "max_compute_units   : " << device.get_info<cl::sycl::info::device::max_compute_units>() << std::endl;
    cout << '\t' << "local_mem_size      : " << device.get_info<cl::sycl::info::device::local_mem_size>() << std::endl;
    cout << '\t' << "max_mem_alloc_size  : " << device.get_info<cl::sycl::info::device::max_mem_alloc_size>() << std::endl;
    cout << '\t' << "profile             : " << device.get_info<cl::sycl::info::device::profile>() << std::endl;
    cout << "----------------------------------------------------------------------------------------------" <<  std::endl << std::endl << std::endl;

    /* We only give a valid score to devices that support SPIR. */
    if (device.has_extension(cl::sycl::string_class("cl_khr_spir"))) {
      if (device.get_info<info::device::device_type>() ==
          info::device_type::cpu) {
        return 50;
      }

      if (device.get_info<info::device::device_type>() ==
          info::device_type::gpu) {
        return 100;
      }
    }
    /* Devices with a negative score will never be chosen. */
    return -1;
  }
};

int main() {
  const int dataSize = 64;
  int ret = -1;
  float data[dataSize] = {0.f};

  range<1> dataRange(dataSize);
  buffer<float, 1> buf(data, dataRange);

  /* We create an object of custom_selector type and use it
   * like any other selector. */
  custom_selector selector;
  queue myQueue(selector);

  myQueue.submit([&](handler& cgh) {
    auto ptr = buf.get_access<access::mode::read_write>(cgh);

    cgh.parallel_for<class example_kernel>(dataRange, [=](item<1> item) {
      size_t idx = item.get_linear_id();
      ptr[item.get_linear_id()] = static_cast<float>(idx);
    });
  });

  /* A host accessor can be used to force an update from the device to the
   * host, allowing the data to be checked. */
  accessor<float, 1, access::mode::read_write, access::target::host_buffer>
      hostPtr(buf);

  if (hostPtr[10] == 10.0f) {
    ret = 0;
  }

  return ret;
}
mirh commented 6 years ago

Try computecpp 0.9.0 for starters?

lu4 commented 6 years ago

Sorry, didn't understood the question... I was using ComputeCpp-v0.6.0-4212-gb29ac8a, but ComputeCpp-v0.6.0-4212-gb29ac8a itself is working fine, it looks as TF is buggy...

Rbiessy commented 6 years ago

@lu4 as @mirh suggested, compiling with our latest ComputeCpp version will let you use a more recent version of TF. Could you try and download ComputeCpp CE 0.9.1? To compile you will need to use the latest commit of the eigen_sycl branch here: https://github.com/codeplaysoftware/tensorflow/tree/eigen_sycl

lu4 commented 6 years ago

Oh, I see, thanks, trying...

lu4 commented 6 years ago

vagrant@ubuntu-xenial:~/Project/tensorflow_eigen$ bazel build -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package Extracting Bazel installation... Starting local Bazel server and connecting to it... .................. INFO: SHA256 (https://github.com/KhronosGroup/OpenCL-Headers/archive/f039db6764d52388658ef15c30b2237bbda49803.tar.gz) = a29e3e67beef1ad0ea6b0afd44b4b2c0e6054d1f9d68fdbd0c4ce434e59533e0 ERROR: /home/vagrant/.cache/bazel/_bazel_vagrant/e647697a348b187726950a371af92dd1/external/jpeg/BUILD:126:12: Illegal ambiguous match on configurable attribute "deps" in @jpeg//:jpeg: @jpeg//:k8 @jpeg//:armeabi-v7a Multiple matches are not allowed unless one is unambiguously more specialized. ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted:

/home/vagrant/.cache/bazel/_bazel_vagrant/e647697a348b187726950a371af92dd1/external/jpeg/BUILD:126:12: Illegal ambiguous match on configurable attribute "deps" in @jpeg//:jpeg: @jpeg//:k8 @jpeg//:armeabi-v7a Multiple matches are not allowed unless one is unambiguously more specialized. INFO: Elapsed time: 16.227s INFO: 0 processes. FAILED: Build did NOT complete successfully (132 packages loaded) currently loading: tensorflow/core/kernels

lu4 commented 6 years ago

It looks as the build system is trying to use arm architecture to build up, have no clue why...

Rbiessy commented 6 years ago

Ha this is a known issue with TF 1.6 and the recent versions of bazel. You have to use bazel 0.11.1 for our current version of TF. Make sure to manually remove the cache before compiling again.

lu4 commented 6 years ago

Thanks, trying...

lu4 commented 6 years ago

@Rbiessy @mirh Ok, guys I've compiled TF as mentioned above, for both eigen and lukeiwanski repos, i.e. by using ComputeCpp CE 0.9.1, but the resulting TF build reports b'ComputeCpp-v0.6.0-4212-gb29ac8a' 1.8.0-rc1. In addition to that it sees just one card.

mirh commented 6 years ago

On a night in europe, hardly I think.

Anyway, for the love of me, your dev environment seems just so much weird. Can't you clean it or try on another system?

And you are trying to build this, right? https://github.com/lukeiwanski/tensorflow/archive/dev/amd_gpu.zip

rodburns commented 6 years ago

Can you post the output of the "computecpp_info" tool located in the "bin" folder of the ComputeCpp release you are using?

lu4 commented 6 years ago

Hi, here is the output:

$ /usr/local/computecpp/bin/computecpp_info
********************************************************************************

ComputeCpp Info (CE 0.9.1)

SYCL 1.2.1 revision 3

********************************************************************************

Toolchain information:

GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.

********************************************************************************

Device Info:

Discovered 8 devices matching:
  platform    : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Ellesmere
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2482.3
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 1:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Ellesmere
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2482.3
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 2:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Ellesmere
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2482.3
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 3:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Ellesmere
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2482.3
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 4:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Ellesmere
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2482.3
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 5:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Ellesmere
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2482.3
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 6:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Ellesmere
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2482.3
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 7:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Ellesmere
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2482.3
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU

If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v0.9.1/platform-support-notes

********************************************************************************
lu4 commented 6 years ago

@mirh I was able compile TF using provided archive but it still shows just one GPU in TF.

lu4 commented 6 years ago

Guys, I was wondering if you provide payed support, I need to get TF working with all devices in my machine? The issue is highly critical for me and I'm willing to pay couple of hundred bucks to get the ball rolling. Is it possible somehow?

lukeiwanski commented 6 years ago

@lu4 thanks for the report. It is some interesting rig you have there.

So far our focus was on supporting systems with only one device - like one GPU and combinations of devices like CPU with one GPU and one other accelerator.

It is quite complex to add support for multiple GPU - nevertheless, I believe we should do this.

This task most likely will take some time - have you tried HiP?

As of the paid support can you email me directly regarding that?

jwlawson commented 6 years ago

@lu4 I have absolutely no idea if this will work, but when you create a tensorflow session try setting the SYCL device count in the session config options:

import tensorflow as tf
with tf.Session(config=tf.ConfigProto(device_count={'SYCL': 8})) as sess:
  print(sess.list_devices())

Even if this does allow TF to see all your devices I don't know if it will automatically schedule compute across all of them. It would be very interesting to hear the results of this.

lu4 commented 6 years ago

@jwlawson your trick worked, I was able to access all GPUs in my system, though it turns out that not everything works smooth for example eager execution is not able to get advantage of all the cards (it may be also due to misconfiguration), for some reason it just to binds with gpu:0 and does not want to use anything else. I'm continuing to investigate further and report on if will find anything useful.

lu4 commented 6 years ago

@lukeiwanski I've sent an email to you (used your github email luke@codeplay.com), JFYI

lukeiwanski commented 6 years ago

@lu4 yes the email is correct.. however, I cannot find any email from you :(