preda / gpuowl

GPU Mersenne primality test.
GNU General Public License v3.0
127 stars 35 forks source link

Gpuowl cannot run with ROCm 4.1.0/Navi 10 (Radeon RX 5700 XT) #223

Closed valeriob01 closed 3 years ago

valeriob01 commented 3 years ago

Relevant rocminfo output:

*******                  
Agent 5                  
*******                  
  Name:                    gfx1010                            
  Uuid:                    GPU-XX                             
  Marketing Name:          Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    4                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 29471(0x731f)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2100                               
  BDFID:                   2560                               
  Internal Node ID:        4                                  
  Compute Unit:            40                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        80(0x50)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1010:xnack-  
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             
# /opt/rocm/bin/rocm-smi 

======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU  Temp   AvgPwr  SCLK    MCLK    Fan     Perf  PwrCap  VRAM%  GPU%  
0    33.0c  7.0W    800Mhz  100Mhz  21.96%  auto  188.0W    0%   0%    
================================================================================
============================= End of ROCm SMI Log ==============================
# ./gpuowl -h
2021-04-09 04:33:54 GpuOwl VERSION v7.2-69-g23c14a1-dirty

-dir <folder>      : specify local work directory (containing worktodo.txt, results.txt, config.txt, gpuowl.log)
-pool <dir>        : specify a directory with the shared (pooled) worktodo.txt and results.txt
                     Multiple GpuOwl instances, each in its own directory, can share a pool of assignments and report
                     the results back to the common pool.
-uid <unique_id>   : specifies to use the GPU with the given unique_id (only on ROCm/Linux)
-user <name>       : specify the user name.
-cpu  <name>       : specify the hardware name.
-time              : display kernel profiling information.
-fft <spec>        : specify FFT e.g.: 1152K, 5M, 5.5M, 256:10:1K
-block <value>     : PRP error-check block size. Must divide 10'000.
-log <step>        : log every <step> iterations. Multiple of 10'000.
-carry long|short  : force carry type. Short carry may be faster, but requires high bits/word.
-B1                : P-1 B1 bound
-B2                : P-1 B2 bound
-rB2               : ratio of B2 to B1. Default 30, used only if B2 is not explicitly set
-prp <exponent>    : run a single PRP test and exit, ignoring worktodo.txt
-verify <file>     : verify PRP-proof contained in <file>
-proof <power>     : By default a proof of power 8 is generated, using 3GB of temporary disk space for a 100M exponent.
                     A lower power reduces disk space requirements but increases the verification cost.
                     A proof of power 9 uses 6GB of disk space for a 100M exponent and enables faster verification.
-autoverify <power> : Self-verify proofs generated with at least this power. Default 9.
-tmpDir <dir>      : specify a folder with plenty of disk space where temporary proof checkpoints will be stored.
-results <file>    : name of results file, default 'results.txt'
-iters <N>         : run next PRP test for <N> iterations and exit. Multiple of 10000.
-maxAlloc <size>   : limit GPU memory usage to size, which is a value with suffix M for MB and G for GB.
                     e.g. -maxAlloc 2048M or -maxAlloc 3.5G
-save <N>          : specify the number of savefiles to keep (default 12).
-noclean           : do not delete data after the test is complete.
-from <iteration>  : start at the given iteration instead of the most recent saved iteration
-yield             : enable work-around for Nvidia GPUs busy wait. Do not use on AMD GPUs!
-nospin            : disable progress spinner
-use NEW_FFT8,OLD_FFT5,NEW_FFT10: comma separated list of defines, see the #if tests in gpuowl.cl (used for perf tuning)
-unsafeMath        : use OpenCL -cl-unsafe-math-optimizations (use at your own risk)
-binary <file>     : specify a file containing the compiled kernels binary
-device <N>        : select a specific device:
2021-04-09 04:33:54 Exception gpu_error:  clGetPlatformIDs(16, platforms, (unsigned *) &nPlatforms) at clwrap.cpp:70 getDeviceIDs
2021-04-09 04:33:54 Bye
valeriob01 commented 3 years ago

Going to test with amdgpu-pro. Will report result here.

valeriob01 commented 3 years ago

gpuowl -h

2021-04-09 09:51:08 GpuOwl VERSION v7.2-69-g23c14a1-dirty ... -device : select a specific device: 0 : gfx1010-Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] AMD ... 2021-04-09 09:51:09 Exiting because "help" 2021-04-09 09:51:09 Bye

valeriob01 commented 3 years ago

Ok. Got the board to work.