krrishnarraj / clpeak

A tool which profiles OpenCL devices to find their peak capacities
Apache License 2.0
386 stars 109 forks source link

results for NVIDIA GeForce RTX 4090 (at 248W power limit) in Windows 11 #106

Open moyang opened 1 year ago

moyang commented 1 year ago
  Platform: NVIDIA CUDA
  Device: NVIDIA GeForce RTX 4090
    Driver version  : 531.61 (Win64)
    Compute units   : 128
    Clock frequency : 2520 MHz

    Global memory bandwidth (GBPS)
      float   : 866.65
      float2  : 888.99
      float4  : 909.81
      float8  : 920.69
      float16 : 921.32

    Single-precision compute (GFLOPS)
      float   : 71356.09
      float2  : 75607.30
      float4  : 76967.14
      float8  : 71584.66
      float16 : 70986.91

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 1289.16
      double2  : 1310.90
      double4  : 1364.63
      double8  : 1311.27
      double16 : 1356.09

    Integer compute (GIOPS)
      int   : 40810.12
      int2  : 35957.76
      int4  : 35848.03
      int8  : 35623.48
      int16 : 35670.32

    Integer compute Fast 24bit (GIOPS)
      int   : 36497.91
      int2  : 35032.96
      int4  : 35321.97
      int8  : 35034.14
      int16 : 35219.38

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 20.93
      enqueueReadBuffer               : 20.06
      enqueueWriteBuffer non-blocking : 20.93
      enqueueReadBuffer non-blocking  : 20.06
      enqueueMapBuffer(for read)      : 10.78
        memcpy from mapped ptr        : 28.55
      enqueueUnmap(after write)       : 26.87
        memcpy to mapped ptr          : 28.09

    Kernel launch latency : 8.36 us