numba / roctools

Tools for using AMD ROCm with Numba
BSD 2-Clause "Simplified" License
9 stars 6 forks source link

Problems with latest rocm environment #5

Open philolt opened 5 years ago

philolt commented 5 years ago

Hello, I tried to run some of the examples with rocm 2.9 environment unsuccessfully. Just saw that the repository wasn't updated since last year. Is it correct that the current state of Numba is incompatible with the latest rocm stack? Which recently version of rocm is compatible with numba? numba -s shows:

System info:
--------------------------------------------------------------------------------
__Time Stamp__
2019-10-06 09:57:48.722056

__Hardware Information__
Machine                                       : x86_64
CPU Name                                      : znver1
CPU count                                     : 32
CFS restrictions                              : None
CPU Features                                  : 
64bit adx aes avx avx2 bmi bmi2 clflushopt clzero cmov cx16 f16c fma fsgsbase
lzcnt mmx movbe mwaitx pclmul popcnt prfchw rdrnd rdseed sahf sha sse sse2 sse3
sse4.1 sse4.2 sse4a ssse3 xsave xsavec xsaveopt xsaves

__OS Information__
Platform                                      : Linux-5.0.0-31-generic-x86_64-with-debian-stretch-sid
Release                                       : 5.0.0-31-generic
System Name                                   : Linux
Version                                       : #33~18.04.1-Ubuntu SMP Tue Oct 1 10:20:39 UTC 2019
OS specific info                              : debianstretch/sid
glibc info                                    : glibc 2.10

__Python Information__
Python Compiler                               : GCC 7.3.0
Python Implementation                         : CPython
Python Version                                : 3.7.4
Python Locale                                 : en_US UTF-8

__LLVM information__
LLVM version                                  : 8.0.0

__CUDA Information__
CUDA driver library cannot be found or no CUDA enabled devices are present.
Error class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>

__ROC Information__
ROC available                                 : True
Available Toolchains                          : librocmlite library, ROC command line tools

Found 3 HSA Agents:
Agent id  : 0
    vendor: CPU
      name: AMD Ryzen Threadripper 2950X 16-Core Processor
      type: CPU

Agent id  : 1
    vendor: AMD
      name: gfx906
      type: GPU

Agent id  : 2
    vendor: AMD
      name: gfx906
      type: GPU

Found 2 discrete GPU(s)                       : gfx906, gfx906

__SVML Information__
SVML state, config.USING_SVML                 : False
SVML library found and loaded                 : False
llvmlite using SVML patched LLVM              : True
SVML operational                              : False

__Threading Layer Information__
TBB Threading layer available                 : False
+--> Disabled due to                          : Unknown import problem.
OpenMP Threading layer available              : True
Workqueue Threading layer available           : True

__Numba Environment Variable Information__
None set.

__Conda Information__
conda_build_version                           : 3.18.8
conda_env_version                             : 4.7.10
platform                                      : linux-64
python_version                                : 3.7.3.final.0
root_writable                                 : True

__Current Conda Env__
_libgcc_mutex             0.1                        main  
blas                      1.0                         mkl  
ca-certificates           2019.8.28                     0  
certifi                   2019.9.11                py37_0  
intel-openmp              2019.4                      243  
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
llvmlite                  0.30.0rc1        py37hf484d3e_0    numba
mkl                       2019.4                      243  
mkl-service               2.3.0            py37he904b0f_0  
mkl_fft                   1.0.14           py37ha843d7b_0  
mkl_random                1.1.0            py37hd6b4f25_0  
ncurses                   6.1                  he6710b0_1  
numba                     0.46.0rc1       np116py37hf484d3e_0    numba
numpy                     1.16.4           py37h7e9f1db_0  
numpy-base                1.16.4           py37hde5b4d6_0  
openssl                   1.1.1d               h7b6447c_2  
pip                       19.2.3                   py37_0  
python                    3.7.4                h265db76_1  
readline                  7.0                  h7b6447c_5  
roctools                  0.0.0                hf484d3e_1    numba
setuptools                41.2.0                   py37_0  
six                       1.12.0                   py37_0  
sqlite                    3.30.0               h7b6447c_0  
tk                        8.6.8                hbc83047_0  
wheel                     0.33.6                   py37_0  
xz                        5.2.4                h14c3975_4  
zlib                      1.2.11               h7b6447c_3  
--------------------------------------------------------------------------------
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as
appropriate.

=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.
=============================================================

Running the Matrix Multiplication example from https://numba.pydata.org/numba-doc/dev/roc/examples.html#matrix-multiplication results in:

# numba test.py 
warning: Linking two modules of different data layouts: '' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5' whereas '<string>' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5'

warning: Linking two modules of different target triples: ' is 'amdgcn-amd-amdhsa-amdgizcl' whereas '<string>' is 'amdgcn--amdhsa'

warning: Linking two modules of different data layouts: '' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5' whereas '<string>' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5'

warning: Linking two modules of different target triples: ' is 'amdgcn-amd-amdhsa-amdgizcl' whereas '<string>' is 'amdgcn--amdhsa'

warning: Linking two modules of different data layouts: '' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5' whereas '<string>' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5'

warning: Linking two modules of different target triples: ' is 'amdgcn-amd-amdhsa-amdgizcl' whereas '<string>' is 'amdgcn--amdhsa'

warning: Linking two modules of different data layouts: '' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5' whereas '<string>' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5'

warning: Linking two modules of different target triples: ' is 'amdgcn-amd-amdhsa-amdgizcl' whereas '<string>' is 'amdgcn--amdhsa'

warning: Linking two modules of different data layouts: '' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5' whereas '<string>' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5'

warning: Linking two modules of different target triples: ' is 'amdgcn-amd-amdhsa-amdgizcl' whereas '<string>' is 'amdgcn--amdhsa'

warning: Linking two modules of different data layouts: '' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5' whereas '<string>' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5'

warning: Linking two modules of different target triples: ' is 'amdgcn-amd-amdhsa-amdgizcl' whereas '<string>' is 'amdgcn--amdhsa'

warning: Linking two modules of different data layouts: '' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5' whereas '<string>' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5'

warning: Linking two modules of different target triples: ' is 'amdgcn-amd-amdhsa-amdgizcl' whereas '<string>' is 'amdgcn--amdhsa'

warning: Linking two modules of different data layouts: '' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5' whereas '<string>' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5'

warning: Linking two modules of different target triples: ' is 'amdgcn-amd-amdhsa-amdgizcl' whereas '<string>' is 'amdgcn--amdhsa'

warning: Linking two modules of different data layouts: '' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5' whereas '<string>' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5'

warning: Linking two modules of different target triples: ' is 'amdgcn-amd-amdhsa-amdgizcl' whereas '<string>' is 'amdgcn--amdhsa'

'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
LLVM ERROR: Attempting to emit S_LOAD_DWORDX2_IMM_si instruction but the Feature_isGCN predicate(s) are not met

rocminfo shows:

=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen Threadripper 2950X 16-Core Processor
  Marketing Name:          AMD Ryzen Threadripper 2950X 16-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3500                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65879904(0x3ed3f60) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    65879904(0x3ed3f60) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx906                             
  Marketing Name:          Vega 20                            
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26287(0x66af)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1801                               
  BDFID:                   2560                               
  Internal Node ID:        1                                  
  Compute Unit:            60                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Acessible by all:        FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx906          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 3                  
*******                  
  Name:                    gfx906                             
  Marketing Name:          Vega 20                            
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26287(0x66af)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1801                               
  BDFID:                   17152                              
  Internal Node ID:        2                                  
  Compute Unit:            60                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Acessible by all:        FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx906          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***      
bawaji94 commented 4 years ago

Having the same issue

kburns commented 4 years ago

Same issue here with ROCm 3.5.1 and numba 0.50.1.

philolt commented 4 years ago

It seems, this repo is not actively maintained. Switched to rocm tensorflow for my projects to enhance my vega VII's. I couldn't get Numba working with latest ROCM stacks. Too bad, because I think it's way more intuitive to parallelize workloads in python with numba. If someone has a working config, post it here please.