krrishnarraj / clpeak

A tool which profiles OpenCL devices to find their peak capacities
Apache License 2.0
404 stars 113 forks source link

result for Moore Threads S80 in Ubuntu 20.04 #112

Open KatyushaScarlet opened 7 months ago

KatyushaScarlet commented 7 months ago

clpeak version: 1.1.2

Platform: Moore Threads OpenCL Device: MUSA GEN1-104 Driver version : 20230926_develop-36-g6d1e11a670da-dirty release (Linux x64) Compute units : 32 Clock frequency : 1800 MHz

    Global memory bandwidth (GBPS)
      float   : 269.70
      float2  : 373.22
      float4  : 381.20
      float8  : 389.35
      float16 : 397.03

    Single-precision compute (GFLOPS)
      float   : 14190.62
      float2  : 13320.74
      float4  : 13418.10
      float8  : 13379.27
      float16 : 13307.41

    Half-precision compute (GFLOPS)
      half   : 13300.11
      half2  : 13353.18
      half4  : 13422.47
      half8  : 13452.69
      half16 : 13320.31

    Double-precision compute (GFLOPS)
      double   : 35.60
      double2  : 30.08
      double4  : 22.52
      double8  : 13.69
      double16 : 6.99

    Integer compute (GIOPS)
      int   : 2095.66
      int2  : 2091.45
      int4  : 2094.30
      int8  : 2095.51
      int16 : 2096.50

    Integer compute Fast 24bit (GIOPS)
      int   : 2095.58
      int2  : 2092.48
      int4  : 2094.36
      int8  : 2094.98
      int16 : 2097.07

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 5.52
      enqueueReadBuffer               : 4.25
      enqueueWriteBuffer non-blocking : 5.61
      enqueueReadBuffer non-blocking  : 4.15
      enqueueMapBuffer(for read)      : 5263.44
        memcpy from mapped ptr        : 
Socket error Event: 32 Error: 10053.
  Device: MUSA GEN1-104
    Driver version  : 20230926_develop-36-g6d1e11a670da-dirty release (Linux x64)
    Compute units   : 32
    Clock frequency : 1800 MHz

    Global memory bandwidth (GBPS)
      float   : 269.70
      float2  : 373.22
      float4  : 381.20
      float8  : 389.35
      float16 : 397.03

    Single-precision compute (GFLOPS)
      float   : 14190.62
      float2  : 13320.74
      float4  : 13418.10
      float8  : 13379.27
      float16 : 13307.41

    Half-precision compute (GFLOPS)
      half   : 13300.11
      half2  : 13353.18
      half4  : 13422.47
      half8  : 13452.69
      half16 : 13320.31

    Double-precision compute (GFLOPS)
      double   : 35.60
      double2  : 30.08
      double4  : 22.52
      double8  : 13.69
      double16 : 6.99

    Integer compute (GIOPS)
      int   : 2095.66
      int2  : 2091.45
      int4  : 2094.30
      int8  : 2095.51
      int16 : 2096.50

    Integer compute Fast 24bit (GIOPS)
      int   : 2095.58
      int2  : 2092.48
      int4  : 2094.36
      int8  : 2094.98
      int16 : 2097.07

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 5.52
      enqueueReadBuffer               : 4.25
      enqueueWriteBuffer non-blocking : 5.61
      enqueueReadBuffer non-blocking  : 4.15
      enqueueMapBuffer(for read)      : 5263.44
        memcpy from mapped ptr        : 0.02
      enqueueUnmap(after write)       : 6515.42
        memcpy to mapped ptr          : 5.60

    Kernel launch latency : 31.03 us
krrishnarraj commented 6 months ago

Impressive. Raise a PR with these details