Open lu4 opened 6 years ago
Try computecpp 0.9.0 for starters?
Sorry, didn't understood the question... I was using ComputeCpp-v0.6.0-4212-gb29ac8a, but ComputeCpp-v0.6.0-4212-gb29ac8a itself is working fine, it looks as TF is buggy...
@lu4 as @mirh suggested, compiling with our latest ComputeCpp version will let you use a more recent version of TF. Could you try and download ComputeCpp CE 0.9.1? To compile you will need to use the latest commit of the eigen_sycl branch here: https://github.com/codeplaysoftware/tensorflow/tree/eigen_sycl
Oh, I see, thanks, trying...
vagrant@ubuntu-xenial:~/Project/tensorflow_eigen$ bazel build -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package Extracting Bazel installation... Starting local Bazel server and connecting to it... .................. INFO: SHA256 (https://github.com/KhronosGroup/OpenCL-Headers/archive/f039db6764d52388658ef15c30b2237bbda49803.tar.gz) = a29e3e67beef1ad0ea6b0afd44b4b2c0e6054d1f9d68fdbd0c4ce434e59533e0 ERROR: /home/vagrant/.cache/bazel/_bazel_vagrant/e647697a348b187726950a371af92dd1/external/jpeg/BUILD:126:12: Illegal ambiguous match on configurable attribute "deps" in @jpeg//:jpeg: @jpeg//:k8 @jpeg//:armeabi-v7a Multiple matches are not allowed unless one is unambiguously more specialized. ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted:
/home/vagrant/.cache/bazel/_bazel_vagrant/e647697a348b187726950a371af92dd1/external/jpeg/BUILD:126:12: Illegal ambiguous match on configurable attribute "deps" in @jpeg//:jpeg: @jpeg//:k8 @jpeg//:armeabi-v7a Multiple matches are not allowed unless one is unambiguously more specialized. INFO: Elapsed time: 16.227s INFO: 0 processes. FAILED: Build did NOT complete successfully (132 packages loaded) currently loading: tensorflow/core/kernels
It looks as the build system is trying to use arm architecture to build up, have no clue why...
Ha this is a known issue with TF 1.6 and the recent versions of bazel. You have to use bazel 0.11.1 for our current version of TF. Make sure to manually remove the cache before compiling again.
Thanks, trying...
@Rbiessy @mirh Ok, guys I've compiled TF as mentioned above, for both eigen and lukeiwanski repos, i.e. by using ComputeCpp CE 0.9.1, but the resulting TF build reports b'ComputeCpp-v0.6.0-4212-gb29ac8a' 1.8.0-rc1. In addition to that it sees just one card.
On a night in europe, hardly I think.
Anyway, for the love of me, your dev environment seems just so much weird. Can't you clean it or try on another system?
And you are trying to build this, right? https://github.com/lukeiwanski/tensorflow/archive/dev/amd_gpu.zip
Can you post the output of the "computecpp_info" tool located in the "bin" folder of the ComputeCpp release you are using?
Hi, here is the output:
$ /usr/local/computecpp/bin/computecpp_info
********************************************************************************
ComputeCpp Info (CE 0.9.1)
SYCL 1.2.1 revision 3
********************************************************************************
Toolchain information:
GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.
********************************************************************************
Device Info:
Discovered 8 devices matching:
platform : <any>
device type : <any>
--------------------------------------------------------------------------------
Device 0:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2482.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 1:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2482.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 2:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2482.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 3:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2482.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 4:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2482.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 5:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2482.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 6:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2482.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 7:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2482.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v0.9.1/platform-support-notes
********************************************************************************
@mirh I was able compile TF using provided archive but it still shows just one GPU in TF.
Guys, I was wondering if you provide payed support, I need to get TF working with all devices in my machine? The issue is highly critical for me and I'm willing to pay couple of hundred bucks to get the ball rolling. Is it possible somehow?
@lu4 thanks for the report. It is some interesting rig you have there.
So far our focus was on supporting systems with only one device - like one GPU and combinations of devices like CPU with one GPU and one other accelerator.
It is quite complex to add support for multiple GPU - nevertheless, I believe we should do this.
This task most likely will take some time - have you tried HiP?
As of the paid support can you email me directly regarding that?
@lu4 I have absolutely no idea if this will work, but when you create a tensorflow session try setting the SYCL device count in the session config options:
import tensorflow as tf
with tf.Session(config=tf.ConfigProto(device_count={'SYCL': 8})) as sess:
print(sess.list_devices())
Even if this does allow TF to see all your devices I don't know if it will automatically schedule compute across all of them. It would be very interesting to hear the results of this.
@jwlawson your trick worked, I was able to access all GPUs in my system, though it turns out that not everything works smooth for example eager execution is not able to get advantage of all the cards (it may be also due to misconfiguration), for some reason it just to binds with gpu:0 and does not want to use anything else. I'm continuing to investigate further and report on if will find anything useful.
@lukeiwanski I've sent an email to you (used your github email luke@codeplay.com), JFYI
@lu4 yes the email is correct.. however, I cannot find any email from you :(
System information
Here's info from environment capture script:
Describe the problem
Tensorflow built on top of SYCL refuses to list and use all available GPUs in the system. I'm using the following commands to get list of devices:
(please note that TensorFlow's in-line log presents 8 devices, but the actual resulting variable contains just two CPU and one GPU available through "/device:SYCL:0" name)
I confirm that all devices are functional and available to OpenCL (visible to clinfo) and are operable through another 3-rd party package (ArrayFire). Also I confirm that SYCL itself sees all available devices, in order to test that purpose I've updated SYCL's 'custom-device-selector' example to following code: