krrishnarraj / clpeak

A tool which profiles OpenCL devices to find their peak capacities
Apache License 2.0
386 stars 109 forks source link

results for AMD Radeon Pro VII (connected via Thunderbolt 3 as an eGPU) in Windows 11 #109

Open moyang opened 1 year ago

moyang commented 1 year ago

AMD Radeon Pro VII (Vega 20) Node Titan Thunderbolt 3 eGPU box (with Intel JHL7420 controller)

Platform: AMD Accelerated Parallel Processing
  Device: gfx906
    Driver version  : 3516.0 (PAL,HSAIL) (Win64)
    Compute units   : 60
    Clock frequency : 1700 MHz

    Global memory bandwidth (GBPS)
      float   : 796.86
      float2  : 827.75
      float4  : 822.74
      float8  : 793.45
      float16 : 660.62

    Single-precision compute (GFLOPS)
      float   : 12798.81
      float2  : 12707.00
      float4  : 12880.65
      float8  : 12783.70
      float16 : 12607.74

    Half-precision compute (GFLOPS)
      half   : 8636.53
      half2  : 25210.47
      half4  : 24664.82
      half8  : 23910.08
      half16 : 22193.79

    Double-precision compute (GFLOPS)
      double   : 6455.92
      double2  : 6407.21
      double4  : 6417.81
      double8  : 6388.41
      double16 : 6272.83

    Integer compute (GIOPS)
      int   : 4274.61
      int2  : 4189.85
      int4  : 4215.42
      int8  : 4187.77
      int16 : 4194.81

    Integer compute Fast 24bit (GIOPS)
      int   : 12203.69
      int2  : 11403.85
      int4  : 11338.95
      int8  : 11021.97
      int16 : 10849.96

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 31.31
      enqueueReadBuffer               : 31.82
      enqueueWriteBuffer non-blocking : 31.15
      enqueueReadBuffer non-blocking  : 31.92
      enqueueMapBuffer(for read)      : 810371.12
        memcpy from mapped ptr        : 31.92
      enqueueUnmap(after write)       : 42949672.00
        memcpy to mapped ptr          : 31.59

    Kernel launch latency : 52.06 us