Closed Disty0 closed 3 years ago
is tensorflow-rocm still current? the patches should have been upstreamed last year: https://medium.com/tensorflow/community-supported-amd-rocm-build-for-tensorflow-e8e9ac258369
I've been trying to figure this out on ubuntu https://ninethehacker.xyz/journal/tensorflow-rocm-pop-os
It says on this repo's readme that currently tensorflow does not detect rocm. I have a similar setup to @Disty0 and I can't use tensorflow with rocm either, although I tested another python project (numba I think) and it detects everything.
Also, I would like to contribute to this project.
@Disty0 can you post the exact error you are receiving or some backtrace to help debug.
@Nine-H From what I understand tensorflow-rocm
on pypi is the package maintained by AMD ROCm themselves. Here is the source for it: https://github.com/ROCmSoftwarePlatform/tensorflow-upstream
Basically AMD will push their ROCm changes out to that pypi package as well as upstreaming their code to the official tensorflow repo. The repo linked is still very active with the pypi package last updated on June 4 to coincide with TF 2.2.0 release.
@fernandoblalves we welcome all contributions. Right now we really want to get python-pytorch-rocm
and python-tensorflow-rocm
PKGBUILDs working.
Overall tho, just take a look at our issues and pick one that you can reproduce and start debugging/hacking away.
@acxz I've been tinkering with this repo when I have some time and managed to advance the build, but now I'm getting stuck with this error:
ERROR: /home/fernando/.cache/bazel/_bazel_fernando/c7b7a3ee105d2ad7219f67f83b1c994b/external/upb/BUILD:57:11: C++ compilation of rule '@upb//:upb' failed (Exit 1)
In file included from /usr/include/string.h:495,
from external/upb/upb/upb.h:16,
from external/upb/upb/upb.c:2:
In function 'strncpy',
inlined from 'upb_status_seterrmsg' at external/upb/upb/upb.c:40:3:
/usr/include/bits/string_fortified.h:106:10: error: '__builtin_strncpy' specified bound 127 equals destination size [-Werror=stringop-truncation]
106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
ERROR: /home/fernando/tensorflow-rocm/src/tensorflow-2.2.0-rocm/tensorflow/python/tools/BUILD:313:10 C++ compilation of rule '@upb//:upb' failed (Exit 1)
I have no clue how to fix this. I found this issue their repository which seems to be same thing, so it should already be fixed. Any tips?
managed to advance the build
@fernandoblalves great job on getting that far! Feel free to make a PR with your current changes it is forward progress.
already be fixed
As for the error I am sure you have read the convo at the issue thread you linked, but they clearly state that they have not fixed it yet. Please reread it if you still think it "should already be fixed."
Any tips?
You can try to move the upstream solution along by patching grpc as mentioned in the issue. That would be the proper way to do things. I would suggest that you also take a look at how Arch packages tensorflow, I feel like they must have had to figure a way around this error. The PKGBUILD might hold some clues. In fact just glancing at it compiling with gcc-9 should do the trick. (Which is what our current PKGBUILD is doing anyway) https://git.archlinux.org/svntogit/community.git/tree/trunk/PKGBUILD?h=packages/tensorflow#n94 I realize this could be what you meant by "should already be fixed".
Maybe we can start by confirming that you are using gcc-9 to compile.
Also this issue thread is for the pypi installation of tensorflow-rocm. Since you are compiling from source please open a new issue for it.
@jtiemer can you do me a favor and see if you can try to reproduce this issue and give an error message or bug report?
When i openned this issue it wasn't giving out any errors but it didn't tried to use GPU. I tried reinstalling tensorflow-rocm with yay but it's failed. When i try to run same python code with tensorflow-rocm from pypi it now gives Segmentation fault (core dumped) error. (Same code works with Ubuntu)
@Disty0 this issue is only for the pypi package can you post your issues with the yay
installation in a separate issue.
@jtiemer can you do me a favor and see if you can try to reproduce this issue and give an error message or bug report?
yes, not a problem. I hope to be able to give you some feedback this weekend.
I tried with kernel 5.4.55-1-lts and it works. I think this issue is caused by same issue as rocm-arch/rocm-arch/issues/269
Nice find, @Disty0! That is interesting tho.
Since we do not control the pypi release of tensorflow-rocm and we have a workaround for the issues of using pypi/tensorflow-rocm on Arch Linux now. I am going to close this issue.
In terms of resolving this on the latest kernel, we need to fix the issue linked right above and/or report this issue to the rocm tensorflow repo (https://github.com/ROCmSoftwarePlatform/tensorflow-upstream). There is nothing that we can do in this repo about it.
@jtiemer can you do me a favor and see if you can try to reproduce this issue and give an error message or bug report?
Just tried to compile. Didn't build due to
@jtiemer can you post your steps you used to compile? Do note that this is the issue thread for pypi release i.e. pip install tensorflow-rocm
Oops. Yeah, I read over the pypi
-part. Can't install this due to a version conflict as my requirements are from community
and pypi wants partly newer stuff. Trying to resolve that by getting everything via pip.
As the title says tensorflow-rocm works but only with CPU. HIP examples works fine with GPU.
Here is the rocminfo:
``` rocminfo ROCk module is loaded Able to open /dev/kfd read-write ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 5 1600X Six-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 5 1600X Six-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 0 BDFID: 0 Internal Node ID: 0 Compute Unit: 12 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 16406764(0xfa58ec) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16406764(0xfa58ec) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: N/A ******* Agent 2 ******* Name: gfx900 Uuid: GPU-021500232d0241a4 Marketing Name: Vega 10 XL/XT [Radeon RX Vega 56/64] Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 4096(0x1000) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB Chip ID: 26751(0x687f) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1590 BDFID: 10240 Internal Node ID: 1 Compute Unit: 56 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: FALSE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 8372224(0x7fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx900 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ```Here is the clinfo:
``` clinfo Number of platforms 1 Platform Name AMD Accelerated Parallel Processing Platform Vendor Advanced Micro Devices, Inc. Platform Version OpenCL 2.0 AMD-APP.dbg (3137.0) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_amd_event_callback Platform Extensions function suffix AMD Platform Name AMD Accelerated Parallel Processing Number of devices 1 Device Name gfx900 Device Vendor Advanced Micro Devices, Inc. Device Vendor ID 0x1002 Device Version OpenCL 2.0 Driver Version 3137.0 (HSA1.1,LC) Device OpenCL C Version OpenCL C 2.0 Device Type GPU Device Board Name (AMD) Vega 10 XL/XT [Radeon RX Vega 56/64] Device Topology (AMD) PCI-E, 28:00.0 Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 56 SIMD per compute unit (AMD) 4 SIMD width (AMD) 16 SIMD instruction width (AMD) 1 Max clock frequency 1590MHz Graphics IP (AMD) 9.0 Device Partition (core) Max number of sub-devices 56 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 1024x1024x1024 Max work group size 256 Preferred work group size (AMD) 256 Max work group size (AMD) 1024 Preferred work group size multiple 64 Wavefront width (AMD) 64 Preferred / native vector sizes char 4 / 4 short 2 / 2 int 1 / 1 long 1 / 1 half 1 / 1 (cl_khr_fp16) float 1 / 1 double 1 / 1 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals No Infinity and NANs No Round to nearest No Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Address bits 64, Little-Endian Global memory size 8573157376 (7.984GiB) Global free memory (AMD) 8372224 (7.984GiB) Global memory channels (AMD) 64 Global memory banks per channel (AMD) 4 Global memory bank width (AMD) 256 bytes Error Correction support No Max memory allocation 7287183769 (6.787GiB) Unified memory for Host and Device No Shared Virtual Memory (SVM) capabilities (core) Coarse-grained buffer sharing Yes Fine-grained buffer sharing Yes Fine-grained system sharing No Atomics No Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Preferred alignment for atomics SVM 0 bytes Global 0 bytes Local 0 bytes Max size for global variable 7287183769 (6.787GiB) Preferred total size of global vars 8573157376 (7.984GiB) Global Memory cache type Read/Write Global Memory cache size 16384 (16KiB) Global Memory cache line size 64 bytes Image support Yes Max number of samplers per kernel 26751 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 256 bytes Pitch alignment for 2D image buffers 256 pixels Max 2D image size 16384x16384 pixels Max 3D image size 2048x2048x2048 pixels Max number of read image args 128 Max number of write image args 8 Max number of read/write image args 64 Max number of pipe args 16 Max active pipe reservations 16 Max pipe packet size 2992216473 (2.787GiB) Local memory type Local Local memory size 65536 (64KiB) Local memory syze per CU (AMD) 65536 (64KiB) Local memory banks (AMD) 32 Max number of constant args 8 Max constant buffer size 7287183769 (6.787GiB) Preferred constant buffer size (AMD) 16384 (16KiB) Max size of kernel argument 1024 Queue properties (on host) Out-of-order execution No Profiling Yes Queue properties (on device) Out-of-order execution Yes Profiling Yes Preferred size 262144 (256KiB) Max size 8388608 (8MiB) Max queues on device 1 Max events on device 1024 Prefer user sync for interop Yes Number of P2P devices (AMD) 0 P2P devices (AMD)Here is the quick search of the packages i installed (mostly from arch4edu):
``` yay -Qs rocm local/hip-rocclr 3.5.0-4 Heterogeneous Interface for Portability ROCm local/hipblas 3.5.0-1 ROCm BLAS marshalling library local/hsa-ext-rocr 3.5.1-1 ROCm Platform Runtime: Closed source components local/hsa-rocr 3.5.0-1 ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime local/rccl 3.5.0-2 ROCm Communication Collectives Library local/rocalution 3.5.0-1 Next generation library for iterative sparse solvers for ROCm platform local/rocblas 3.5.0-1 Next generation BLAS implementation for ROCm platform local/rocfft 3.5.0-1 Next generation FFT implementation for ROCm local/rocm-clang-ocl 3.5.0-2 OpenCL compilation with clang compiler. local/rocm-cmake 3.5.0-1 CMake modules for common build tasks needed for the ROCm software stack local/rocm-dbgapi 3.5.0-2 Support library necessary for a debugger of AMD's GPUs local/rocm-debug-agent 3.5.0-2 ROCr Debug Agent Library local/rocm-dev 3.5.0-1 ROCm Dev - Metapackage for the ROCm Development Stack local/rocm-device-libs 3.5.0-1 Radeon Open Compute - device libs local/rocm-dkms 3.5.0-1 ROCm - Open Soruce Platform for HPC and Ultrascale GPU Computing local/rocm-gdb 3.5.0-2 ROCm source-level debugger for Linux, based on GDB local/rocm-libs 3.5.0-1 ROCm Libs - Libraries utilizing HPC and Ultrascale GPU Computing of ROCm local/rocm-opencl-runtime 3.5.0-1 Radeon Open Compute - OpenCL runtime local/rocm-smi 3.5.0-1 Utility to manage and monitor AMDGPU / ROCm systems local/rocm-smi-lib64 3.5.0-2 ROCm SMI LIB local/rocm-utils 3.5.0-1 ROCm Platform Runtime: Utils local/rocminfo 3.5.0-1 ROCm info tools - rocm_agent_enumerator local/rocrand 3.5.0-1 Pseudo-random and quasi-random number generator on ROCm local/rocsolver 3.5.0-2 Subset of LAPACK functionality on the ROCm platform local/rocsparse 3.5.0-1 BLAS for sparse computation on top of ROCm local/rocthrust 3.5.0-2 Port of the Thrust parallel algorithm library atop HIP/ROCm local/roctracer 3.5.0-2 ROCm Tracer Callback/Activity Library for Performance tracing AMD GPU's ``` ``` yay -Qs hip local/gcc-libs 10.1.0-2 Runtime libraries shipped by GCC local/hip-amdgpu-pro 19.30_934563-1 (Radeon_Software_for_Linux) HIP-CLANG runtime. HIP-CLANG allows developers to convert CUDA code to common C++ local/hip-rocclr 3.5.0-4 Heterogeneous Interface for Portability ROCm local/hipblas 3.5.0-1 ROCm BLAS marshalling library local/hipcub 3.5.0-2 Header-only library on top of rocPRIM or CUB local/hipsparse 3.5.0-3 rocSPARSE marshalling library. local/lib32-gcc-libs 10.1.0-2 (multilib-devel) 32-bit runtime libraries shipped by GCC local/rocprim 3.5.0-2 Header-only library providing HIP parallel primitives local/rocthrust 3.5.0-2 Port of the Thrust parallel algorithm library atop HIP/ROCm ``` ``` yay -Qs opencl local/clinfo 2.2.18.04.06-2 Simple OpenCL application that enumerates all available platform and device properties local/miopen-opencl 3.5.0-3 AMD's Machine Intelligence Library (OpenCL backend) local/miopengemm 1.1.6-2 An OpenCL GEMM kernel generator local/ocl-icd 2.2.12-4 OpenCL ICD Bindings local/opencl-headers 2:2.2.20170516-3 OpenCL (Open Computing Language) header files local/rocm-clang-ocl 3.5.0-2 OpenCL compilation with clang compiler. local/rocm-opencl-runtime 3.5.0-1 Radeon Open Compute - OpenCL runtime ```