[pypi/tensorfow-rocm] doesn't detects GPU

Disty0 commented 4 years ago

As the title says tensorflow-rocm works but only with CPU. HIP examples works fine with GPU.

Here is the rocminfo:

``` rocminfo ROCk module is loaded Able to open /dev/kfd read-write ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 5 1600X Six-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 5 1600X Six-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 0 BDFID: 0 Internal Node ID: 0 Compute Unit: 12 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 16406764(0xfa58ec) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16406764(0xfa58ec) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: N/A ******* Agent 2 ******* Name: gfx900 Uuid: GPU-021500232d0241a4 Marketing Name: Vega 10 XL/XT [Radeon RX Vega 56/64] Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 4096(0x1000) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB Chip ID: 26751(0x687f) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1590 BDFID: 10240 Internal Node ID: 1 Compute Unit: 56 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: FALSE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 8372224(0x7fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx900 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ```

Here is the clinfo:

``` clinfo Number of platforms 1 Platform Name AMD Accelerated Parallel Processing Platform Vendor Advanced Micro Devices, Inc. Platform Version OpenCL 2.0 AMD-APP.dbg (3137.0) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_amd_event_callback Platform Extensions function suffix AMD Platform Name AMD Accelerated Parallel Processing Number of devices 1 Device Name gfx900 Device Vendor Advanced Micro Devices, Inc. Device Vendor ID 0x1002 Device Version OpenCL 2.0 Driver Version 3137.0 (HSA1.1,LC) Device OpenCL C Version OpenCL C 2.0 Device Type GPU Device Board Name (AMD) Vega 10 XL/XT [Radeon RX Vega 56/64] Device Topology (AMD) PCI-E, 28:00.0 Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 56 SIMD per compute unit (AMD) 4 SIMD width (AMD) 16 SIMD instruction width (AMD) 1 Max clock frequency 1590MHz Graphics IP (AMD) 9.0 Device Partition (core) Max number of sub-devices 56 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 1024x1024x1024 Max work group size 256 Preferred work group size (AMD) 256 Max work group size (AMD) 1024 Preferred work group size multiple 64 Wavefront width (AMD) 64 Preferred / native vector sizes char 4 / 4 short 2 / 2 int 1 / 1 long 1 / 1 half 1 / 1 (cl_khr_fp16) float 1 / 1 double 1 / 1 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals No Infinity and NANs No Round to nearest No Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Address bits 64, Little-Endian Global memory size 8573157376 (7.984GiB) Global free memory (AMD) 8372224 (7.984GiB) Global memory channels (AMD) 64 Global memory banks per channel (AMD) 4 Global memory bank width (AMD) 256 bytes Error Correction support No Max memory allocation 7287183769 (6.787GiB) Unified memory for Host and Device No Shared Virtual Memory (SVM) capabilities (core) Coarse-grained buffer sharing Yes Fine-grained buffer sharing Yes Fine-grained system sharing No Atomics No Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Preferred alignment for atomics SVM 0 bytes Global 0 bytes Local 0 bytes Max size for global variable 7287183769 (6.787GiB) Preferred total size of global vars 8573157376 (7.984GiB) Global Memory cache type Read/Write Global Memory cache size 16384 (16KiB) Global Memory cache line size 64 bytes Image support Yes Max number of samplers per kernel 26751 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 256 bytes Pitch alignment for 2D image buffers 256 pixels Max 2D image size 16384x16384 pixels Max 3D image size 2048x2048x2048 pixels Max number of read image args 128 Max number of write image args 8 Max number of read/write image args 64 Max number of pipe args 16 Max active pipe reservations 16 Max pipe packet size 2992216473 (2.787GiB) Local memory type Local Local memory size 65536 (64KiB) Local memory syze per CU (AMD) 65536 (64KiB) Local memory banks (AMD) 32 Max number of constant args 8 Max constant buffer size 7287183769 (6.787GiB) Preferred constant buffer size (AMD) 16384 (16KiB) Max size of kernel argument 1024 Queue properties (on host) Out-of-order execution No Profiling Yes Queue properties (on device) Out-of-order execution Yes Profiling Yes Preferred size 262144 (256KiB) Max size 8388608 (8MiB) Max queues on device 1 Max events on device 1024 Prefer user sync for interop Yes Number of P2P devices (AMD) 0 P2P devices (AMD) Profiling timer resolution 1ns Profiling timer offset since Epoch (AMD) 0ns (Thu Jan 1 02:00:00 1970) Execution capabilities Run OpenCL kernels Yes Run native kernels No Thread trace supported (AMD) No Number of async queues (AMD) 8 Max real-time compute queues (AMD) 8 Max real-time compute units (AMD) 56 printf() buffer size 4194304 (4MiB) Built-in kernels (n/a) Device Extensions cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] Success [AMD] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name AMD Accelerated Parallel Processing Device Name gfx900 clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name AMD Accelerated Parallel Processing Device Name gfx900 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name AMD Accelerated Parallel Processing Device Name gfx900 ```

Here is the quick search of the packages i installed (mostly from arch4edu):

``` yay -Qs rocm local/hip-rocclr 3.5.0-4 Heterogeneous Interface for Portability ROCm local/hipblas 3.5.0-1 ROCm BLAS marshalling library local/hsa-ext-rocr 3.5.1-1 ROCm Platform Runtime: Closed source components local/hsa-rocr 3.5.0-1 ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime local/rccl 3.5.0-2 ROCm Communication Collectives Library local/rocalution 3.5.0-1 Next generation library for iterative sparse solvers for ROCm platform local/rocblas 3.5.0-1 Next generation BLAS implementation for ROCm platform local/rocfft 3.5.0-1 Next generation FFT implementation for ROCm local/rocm-clang-ocl 3.5.0-2 OpenCL compilation with clang compiler. local/rocm-cmake 3.5.0-1 CMake modules for common build tasks needed for the ROCm software stack local/rocm-dbgapi 3.5.0-2 Support library necessary for a debugger of AMD's GPUs local/rocm-debug-agent 3.5.0-2 ROCr Debug Agent Library local/rocm-dev 3.5.0-1 ROCm Dev - Metapackage for the ROCm Development Stack local/rocm-device-libs 3.5.0-1 Radeon Open Compute - device libs local/rocm-dkms 3.5.0-1 ROCm - Open Soruce Platform for HPC and Ultrascale GPU Computing local/rocm-gdb 3.5.0-2 ROCm source-level debugger for Linux, based on GDB local/rocm-libs 3.5.0-1 ROCm Libs - Libraries utilizing HPC and Ultrascale GPU Computing of ROCm local/rocm-opencl-runtime 3.5.0-1 Radeon Open Compute - OpenCL runtime local/rocm-smi 3.5.0-1 Utility to manage and monitor AMDGPU / ROCm systems local/rocm-smi-lib64 3.5.0-2 ROCm SMI LIB local/rocm-utils 3.5.0-1 ROCm Platform Runtime: Utils local/rocminfo 3.5.0-1 ROCm info tools - rocm_agent_enumerator local/rocrand 3.5.0-1 Pseudo-random and quasi-random number generator on ROCm local/rocsolver 3.5.0-2 Subset of LAPACK functionality on the ROCm platform local/rocsparse 3.5.0-1 BLAS for sparse computation on top of ROCm local/rocthrust 3.5.0-2 Port of the Thrust parallel algorithm library atop HIP/ROCm local/roctracer 3.5.0-2 ROCm Tracer Callback/Activity Library for Performance tracing AMD GPU's ``` ``` yay -Qs hip local/gcc-libs 10.1.0-2 Runtime libraries shipped by GCC local/hip-amdgpu-pro 19.30_934563-1 (Radeon_Software_for_Linux) HIP-CLANG runtime. HIP-CLANG allows developers to convert CUDA code to common C++ local/hip-rocclr 3.5.0-4 Heterogeneous Interface for Portability ROCm local/hipblas 3.5.0-1 ROCm BLAS marshalling library local/hipcub 3.5.0-2 Header-only library on top of rocPRIM or CUB local/hipsparse 3.5.0-3 rocSPARSE marshalling library. local/lib32-gcc-libs 10.1.0-2 (multilib-devel) 32-bit runtime libraries shipped by GCC local/rocprim 3.5.0-2 Header-only library providing HIP parallel primitives local/rocthrust 3.5.0-2 Port of the Thrust parallel algorithm library atop HIP/ROCm ``` ``` yay -Qs opencl local/clinfo 2.2.18.04.06-2 Simple OpenCL application that enumerates all available platform and device properties local/miopen-opencl 3.5.0-3 AMD's Machine Intelligence Library (OpenCL backend) local/miopengemm 1.1.6-2 An OpenCL GEMM kernel generator local/ocl-icd 2.2.12-4 OpenCL ICD Bindings local/opencl-headers 2:2.2.20170516-3 OpenCL (Open Computing Language) header files local/rocm-clang-ocl 3.5.0-2 OpenCL compilation with clang compiler. local/rocm-opencl-runtime 3.5.0-1 Radeon Open Compute - OpenCL runtime ```

Nine-H commented 4 years ago

is tensorflow-rocm still current? the patches should have been upstreamed last year: https://medium.com/tensorflow/community-supported-amd-rocm-build-for-tensorflow-e8e9ac258369

I've been trying to figure this out on ubuntu https://ninethehacker.xyz/journal/tensorflow-rocm-pop-os

fernandoblalves commented 4 years ago

It says on this repo's readme that currently tensorflow does not detect rocm. I have a similar setup to @Disty0 and I can't use tensorflow with rocm either, although I tested another python project (numba I think) and it detects everything.

Also, I would like to contribute to this project.

acxz commented 4 years ago

@Disty0 can you post the exact error you are receiving or some backtrace to help debug.

@Nine-H From what I understand tensorflow-rocm on pypi is the package maintained by AMD ROCm themselves. Here is the source for it: https://github.com/ROCmSoftwarePlatform/tensorflow-upstream Basically AMD will push their ROCm changes out to that pypi package as well as upstreaming their code to the official tensorflow repo. The repo linked is still very active with the pypi package last updated on June 4 to coincide with TF 2.2.0 release.

@fernandoblalves we welcome all contributions. Right now we really want to get python-pytorch-rocm and python-tensorflow-rocm PKGBUILDs working.

Overall tho, just take a look at our issues and pick one that you can reproduce and start debugging/hacking away.

fernandoblalves commented 4 years ago

@acxz I've been tinkering with this repo when I have some time and managed to advance the build, but now I'm getting stuck with this error:

ERROR: /home/fernando/.cache/bazel/_bazel_fernando/c7b7a3ee105d2ad7219f67f83b1c994b/external/upb/BUILD:57:11: C++ compilation of rule '@upb//:upb' failed (Exit 1)
In file included from /usr/include/string.h:495,
                 from external/upb/upb/upb.h:16,
                 from external/upb/upb/upb.c:2:
In function 'strncpy',
    inlined from 'upb_status_seterrmsg' at external/upb/upb/upb.c:40:3:
/usr/include/bits/string_fortified.h:106:10: error: '__builtin_strncpy' specified bound 127 equals destination size [-Werror=stringop-truncation]
  106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
ERROR: /home/fernando/tensorflow-rocm/src/tensorflow-2.2.0-rocm/tensorflow/python/tools/BUILD:313:10 C++ compilation of rule '@upb//:upb' failed (Exit 1)

I have no clue how to fix this. I found this issue their repository which seems to be same thing, so it should already be fixed. Any tips?

acxz commented 4 years ago

managed to advance the build

@fernandoblalves great job on getting that far! Feel free to make a PR with your current changes it is forward progress.

already be fixed

As for the error I am sure you have read the convo at the issue thread you linked, but they clearly state that they have not fixed it yet. Please reread it if you still think it "should already be fixed."

Any tips?

You can try to move the upstream solution along by patching grpc as mentioned in the issue. That would be the proper way to do things. I would suggest that you also take a look at how Arch packages tensorflow, I feel like they must have had to figure a way around this error. The PKGBUILD might hold some clues. In fact just glancing at it compiling with gcc-9 should do the trick. (Which is what our current PKGBUILD is doing anyway) https://git.archlinux.org/svntogit/community.git/tree/trunk/PKGBUILD?h=packages/tensorflow#n94 I realize this could be what you meant by "should already be fixed".

Maybe we can start by confirming that you are using gcc-9 to compile.

acxz commented 4 years ago

Also this issue thread is for the pypi installation of tensorflow-rocm. Since you are compiling from source please open a new issue for it.

acxz commented 3 years ago

@jtiemer can you do me a favor and see if you can try to reproduce this issue and give an error message or bug report?

Disty0 commented 3 years ago

When i openned this issue it wasn't giving out any errors but it didn't tried to use GPU. I tried reinstalling tensorflow-rocm with yay but it's failed. When i try to run same python code with tensorflow-rocm from pypi it now gives Segmentation fault (core dumped) error. (Same code works with Ubuntu)

Log of the python code with tensorflow-rocm pypi:

``` 2020-08-07 19:10:22.172454: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so 2020-08-07 19:10:22.319362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1579] Found device 0 with properties: pciBusID: 0000:28:00.0 name: Radeon RX Vega ROCm AMD GPU ISA: gfx900 coreClock: 1.59GHz coreCount: 56 deviceMemorySize: 7.98GiB deviceMemoryBandwidth: -1B/s 2020-08-07 19:10:22.350096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so 2020-08-07 19:10:22.355511: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so 2020-08-07 19:10:22.383576: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so 2020-08-07 19:10:22.386827: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so 2020-08-07 19:10:22.386945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 File exists: 0 File exists: 1 File exists: 2 File exists: 3 File exists: 4 File exists: 5 File exists: 6 File exists: 7 File exists: 8 File exists: 9 File exists: 10 File exists: 11 File exists: 12 File exists: 13 File exists: 14 File exists: 15 File exists: 16 File exists: 17 File exists: 18 File exists: 19 File exists: 20 Final File: 20 2020-08-07 19:10:22.772536: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA 2020-08-07 19:10:22.789535: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3899750000 Hz 2020-08-07 19:10:22.790656: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a836e5b140 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-08-07 19:10:22.790673: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-08-07 19:10:22.791906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1579] Found device 0 with properties: pciBusID: 0000:28:00.0 name: Radeon RX Vega ROCm AMD GPU ISA: gfx900 coreClock: 1.59GHz coreCount: 56 deviceMemorySize: 7.98GiB deviceMemoryBandwidth: -1B/s 2020-08-07 19:10:22.791951: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so 2020-08-07 19:10:22.791962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so 2020-08-07 19:10:22.791973: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so 2020-08-07 19:10:22.791983: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so 2020-08-07 19:10:22.792039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2020-08-07 19:10:24.200297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-08-07 19:10:24.200326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 2020-08-07 19:10:24.200330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N 2020-08-07 19:10:24.753628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7264 MB memory) -> physical GPU (device: 0, name: Radeon RX Vega, pci bus id: 0000:28:00.0) 2020-08-07 19:10:24.763172: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a837d21790 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices: 2020-08-07 19:10:24.763192: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Radeon RX Vega, AMDGPU ISA version: gfx900 Segmentation fault (core dumped) ```

Log of building tensorflow-rocm with yay:

``` disty:~ $ yay -S tensorflow-rocm :: There are 2 providers available for tensorflow-rocm::: Repository AUR 1) tensorflow-rocm 2) tensorflow-opt-rocm Enter a number (default=1): :: Checking for conflicts... :: Checking for inner conflicts... [Aur:1] tensorflow-rocm-2.3.0-2 :: Downloaded PKGBUILD (1/1): tensorflow-rocm 1 tensorflow-rocm (Build Files Exist) ==> Diffs to show? ==> [N]one [A]ll [Ab]ort [I]nstalled [No]tInstalled or (1 2 3, 1-3, ^4) ==> :: (1/1) Parsing SRCINFO: tensorflow-rocm ==> Making package: tensorflow-rocm 2.3.0-2 (Fri 07 Aug 2020 19:06:14 +03) ==> Retrieving sources... -> Downloading tensorflow-rocm-2.3.0.tar.gz... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 129 100 129 0 0 242 0 --:--:-- --:--:-- --:--:-- 242 100 44.3M 100 44.3M 0 0 1061k 0 0:00:42 0:00:42 --:--:-- 977k -> Downloading numpy1.20.patch... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1878 100 1878 0 0 3840 0 --:--:-- --:--:-- --:--:-- 3832 -> Found build-against-actual-mkl.patch -> Found fix_hip_hcc_path.patch ==> Validating source files with sha512sums... tensorflow-rocm-2.3.0.tar.gz ... Passed numpy1.20.patch ... Passed build-against-actual-mkl.patch ... Passed fix_hip_hcc_path.patch ... Passed ==> Making package: tensorflow-rocm 2.3.0-2 (Fri 07 Aug 2020 19:06:59 +03) ==> Checking runtime dependencies... ==> Checking buildtime dependencies... ==> Retrieving sources... -> Found tensorflow-rocm-2.3.0.tar.gz -> Found numpy1.20.patch -> Found build-against-actual-mkl.patch -> Found fix_hip_hcc_path.patch ==> Validating source files with sha512sums... tensorflow-rocm-2.3.0.tar.gz ... Passed numpy1.20.patch ... Passed build-against-actual-mkl.patch ... Passed fix_hip_hcc_path.patch ... Passed ==> Removing existing $srcdir/ directory... ==> Extracting sources... -> Extracting tensorflow-rocm-2.3.0.tar.gz with bsdtar ==> Starting prepare()... patching file third_party/mkl/build_defs.bzl Hunk #1 succeeded at 125 (offset 1 line). patching file third_party/mkl/mkl.BUILD patching file tensorflow/python/lib/core/bfloat16.cc patching file third_party/gpus/rocm_configure.bzl /tmp/aurbuilder/.cache/yay/tensorflow-rocm/PKGBUILD: line 94: /opt/cuda/bin/nvcc: No such file or directory sed: can't read /usr/include/cudnn_version.h: No such file or directory ==> Sources are ready. ==> Making package: tensorflow-rocm 2.3.0-2 (Fri 07 Aug 2020 19:07:05 +03) ==> Checking runtime dependencies... ==> Checking buildtime dependencies... ==> WARNING: Using existing $srcdir/ tree ==> Starting build()... Building with rocm and without non-x86-64 optimizations You have bazel 3.4.1- (@non-git) installed. Please specify the location of python. [Default is /sbin/python3]: Found possible Python library paths: /usr/lib/python3.8/site-packages Please input the desired Python library path to use. Default is [/usr/lib/python3.8/site-packages] Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to download a fresh release of clang? (Experimental) [y/N]: Clang will not be downloaded. Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=ngraph # Build with Intel nGraph support. --config=numa # Build with NUMA support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. --config=v2 # Build TensorFlow 2.x instead of 1.x. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished Starting local Bazel server and connecting to it... INFO: Options provided by the client: Inherited 'common' options: --isatty=1 --terminal_columns=158 INFO: Reading rc options for 'build' from /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: Inherited 'common' options: --experimental_repo_remote_exec INFO: Reading rc options for 'build' from /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: 'build' options: --apple_platform_type=macos --define framework_shared_object=true --define open_source_build=true --java_toolchain=//third_party/toolchains/java:tf_java_toolchain --host_java_toolchain=//third_party/toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --noincompatible_prohibit_aapt1 --enable_platform_specific_config --config=v2 INFO: Reading rc options for 'build' from /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.tf_configure.bazelrc: 'build' options: --action_env PYTHON_BIN_PATH=/sbin/python3 --action_env PYTHON_LIB_PATH=/usr/lib/python3.8/site-packages --python_path=/sbin/python3 --config=xla --config=rocm --action_env TF_CONFIGURE_IOS=0 INFO: Found applicable config definition build:v2 in file /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1 INFO: Found applicable config definition build:xla in file /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: --action_env=TF_ENABLE_XLA=1 --define=with_xla_support=true INFO: Found applicable config definition build:rocm in file /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm=true --define=using_rocm_hipcc=true --action_env TF_NEED_ROCM=1 INFO: Found applicable config definition build:mkl in file /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: --define=build_with_mkl=true --define=enable_mkl=true --define=tensorflow_mkldnn_contraction_kernel=0 --define=build_with_mkl_dnn_v1_only=true -c opt INFO: Found applicable config definition build:linux in file /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: --copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels INFO: Found applicable config definition build:dynamic_kernels in file /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS WARNING: /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/tensorflow/core/BUILD:1749:11: in linkstatic attribute of cc_library rule //tensorflow/core:lib_internal: setting 'linkstatic=1' is recommended if there are no object files. Since this rule was created by the macro 'cc_library', the error might have been caused by the macro implementation WARNING: /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/tensorflow/core/BUILD:2161:16: in linkstatic attribute of cc_library rule //tensorflow/core:framework_internal: setting 'linkstatic=1' is recommended if there are no object files. Since this rule was created by the macro 'tf_cuda_library', the error might have been caused by the macro implementation WARNING: /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/tensorflow/core/BUILD:1774:11: in linkstatic attribute of cc_library rule //tensorflow/core:lib_headers_for_pybind: setting 'linkstatic=1' is recommended if there are no object files. Since this rule was created by the macro 'cc_library', the error might have been caused by the macro implementation WARNING: /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/tensorflow/python/BUILD:4662:11: in py_library rule //tensorflow/python:standard_ops: target '//tensorflow/python:standard_ops' depends on deprecated target '//tensorflow/python/ops/distributions:distributions': TensorFlow Distributions has migrated to TensorFlow Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.distributions will not receive new features, and will be removed by early 2019. You should update all usage of `tf.distributions` to `tfp.distributions`. WARNING: /tmp/aurbuilder/.cache/yay/tensorflow-rocm/src/tensorflow-2.3.0-rocm/tensorflow/python/BUILD:115:11: in py_library rule //tensorflow/python:no_contrib: target '//tensorflow/python:no_contrib' depends on deprecated target '//tensorflow/python/ops/distributions:distributions': TensorFlow Distributions has migrated to TensorFlow Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.distributions will not receive new features, and will be removed by early 2019. You should update all usage of `tf.distributions` to `tfp.distributions`. INFO: Analyzed 4 targets (387 packages loaded, 32449 targets configured). INFO: Found 4 targets... ERROR: /tmp/aurbuilder/.cache/bazel/_bazel_aurbuilder/531d086864c6ac1cdf7095afd34e563d/external/mkl_linux/BUILD:19:11: @mkl_linux//:mkl_libs_linux: missing input file 'external/mkl_linux/lib/libmkl_rt.so', owner: '@mkl_linux//:lib/libmkl_rt.so' ERROR: /tmp/aurbuilder/.cache/bazel/_bazel_aurbuilder/531d086864c6ac1cdf7095afd34e563d/external/mkl_linux/BUILD:19:11 1 input file(s) do not exist INFO: Elapsed time: 10.567s, Critical Path: 0.02s INFO: 0 processes. FAILED: Build did NOT complete successfully ==> ERROR: A failure occurred in build(). Aborting... error making: %!s(func() string=0x55d065f31bb0) ```

acxz commented 3 years ago

@Disty0 this issue is only for the pypi package can you post your issues with the yay installation in a separate issue.

jtiemer commented 3 years ago

@jtiemer can you do me a favor and see if you can try to reproduce this issue and give an error message or bug report?

yes, not a problem. I hope to be able to give you some feedback this weekend.

Disty0 commented 3 years ago

I tried with kernel 5.4.55-1-lts and it works. I think this issue is caused by same issue as rocm-arch/rocm-arch/issues/269

acxz commented 3 years ago

Nice find, @Disty0! That is interesting tho.

acxz commented 3 years ago

Since we do not control the pypi release of tensorflow-rocm and we have a workaround for the issues of using pypi/tensorflow-rocm on Arch Linux now. I am going to close this issue.

In terms of resolving this on the latest kernel, we need to fix the issue linked right above and/or report this issue to the rocm tensorflow repo (https://github.com/ROCmSoftwarePlatform/tensorflow-upstream). There is nothing that we can do in this repo about it.

jtiemer commented 3 years ago

@jtiemer can you do me a favor and see if you can try to reproduce this issue and give an error message or bug report?

Just tried to compile. Didn't build due to

miopen_plugin is missing dependency declarations

``` ERROR: /home/USER/Build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/tensorflow/stream_executor/rocm/BUILD:217:11: undeclared inclusion(s) in rule '//tensorflow/stream_executor/rocm:miopen_plugin': this rule is missing dependency declarations for the following files included by 'tensorflow/stream_executor/rocm/rocm_dnn.cc': 'bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/miopen/miopen.h' 'bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/miopen/config.h' 'bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/miopen/export.h' INFO: Elapsed time: 161,927s, Critical Path: 45,25s INFO: 225 processes: 225 local. FAILED: Build did NOT complete successfully ```

acxz commented 3 years ago

@jtiemer can you post your steps you used to compile? Do note that this is the issue thread for pypi release i.e. pip install tensorflow-rocm

jtiemer commented 3 years ago

Oops. Yeah, I read over the pypi-part. Can't install this due to a version conflict as my requirements are from community and pypi wants partly newer stuff. Trying to resolve that by getting everything via pip.

rocm-arch / tensorflow-rocm

[pypi/tensorfow-rocm] doesn't detects GPU #1