ysh329 / OpenCL-101

Learn OpenCL step by step.
123 stars 31 forks source link

【工具调研】Arm Streamline Performance Analyzer & Arm Mali GPU datasheet #38

Open ysh329 opened 3 years ago

ysh329 commented 3 years ago

也可以申请试用(Try For Free),本调研集中在Mali GPU的OpenCL性能获取。

Introduction to Streamline

Streamline helps you optimize software for devices that use Arm processors.

Evaluate where the software in your system spends most of its time by capturing a performance profile of your application running on a target device. Quickly determine whether your performance bottleneck relates to the CPU processing or GPU rendering using interactive charts and comprehensive data visualizations.

For CPU bottlenecks, use the native profiling functionality to locate specific problem areas in your application code. Investigate how processes, threads, and functions behave, from high-level views, right down to line-by-line source code analysis. The basic profile is based on regular sampling of the PC (Program Counter) of the running threads, allowing identification of the hotspots in the running application. Hardware performance counters that are provided by the target processors can supplement this analysis. These counters enable hotspot analysis to include knowledge of hardware events such as cache misses and branch mispredictions.

For GPU bottlenecks, use performance data from the Arm Mali GPU driver and hardware performance counters to explore the rendering workload efficiency. Visualize the workload breakdown, pipeline loading, and execution characteristics to quickly identify where to apply rendering optimizations.

With Streamline, you can:

ysh329 commented 3 years ago

OpenCL

Investigate OpenCL Use Streamline in Open CL Timeline Mode to investigate Open CL applications. Visualise which kernel is running on the GPU, for how long and any dependencies which will cause kernels to stall.

image

ysh329 commented 3 years ago

Mali GPU counters

To help you interpret the charts in Streamline, see detailed descriptions of all the performance counters available for each Mali GPU.

Arm Mali GPUs implement a comprehensive range of performance counters that enable you to closely monitor GPU activity as your application runs. Arm Streamline visualizes performance counter activity in a series of charts, to help you identify the cause of heavy rendering loads or workload inefficiencies that cause poor GPU performance.

image

Understanding the workload breakdown, pipeline loading, and execution characteristics of your application can help you decide where to apply rendering optimizations. The following guides describe all the performance counters available for each Mali GPU, with some advice about how to interpret the value.

ysh329 commented 3 years ago

Arm Mali GPU datasheet

See the different features and capabilities of Arm Mali GPUs from the Midgard-based Mali-T720, to the Valhall-based Mali-G78.

This reference sheet covers from the Midgard Mali-T720, to Valhall GPUs, up to Mali-G78.

The API Support, Core Features and Microarchitecture.

Features tables cover which GPUs support which technologies. For more on given technologies see links below.

The Core Config table details the specs of the chips, rather than just whether features are available. As such for each GPU it has threads in a warp, total threads, and operations/texels etc per clock cycle, as well as cache sizes. Note that for tile write rate on Arm chips this is both fragments written into the tile and the pixels written back out of the tile. Thread count is the total shader core hardware capacity; note that for OpenGL ES only 128 threads are exposed.

For Texturing, to work out cycles/sample for more complicated filters than bilinear, simply apply the multiplications in the tables on top of the bilinear performance to combine to the required filter. For example, a simple trilinear will be 2 x 1 cycles/sample on a Mali-G72, and 2 x 0.25 cycles/sample on a Mali-G77. To add in 4x anisotropic filtering, multiply by a further 4x. Note that anisotropic filter scaling is the worst-case number, it will usually be less than this.

Finally, the architecture-specific tables give thread counts and registers for the chips. For more on the generations of Arm architectures see links below. For a general picture of Arm architectures see: https://developer.arm.com/architectures/media-architectures/gpu-architecture

image

image

image

image

Core config

image

Texturing

image

ISA Config

image

ysh329 commented 3 years ago

image

FaiScofield commented 1 year ago

请问一下,这个 Open CL Timeline Mode 是要ARM提供的DDK工具编出的OpenCL库才支持吗?这个DDK是得到ARM授权的厂商才有的吗?

Ybuzhenzhuo commented 1 month ago

My GPU is Mali-610, and I have updated the GPU driver using the streamline plugin in DS5. Unable to display openCL due to gatord's config being set. Which Mali supports displaying opencl. Is it Mali-G710?

Ybuzhenzhuo commented 1 month ago

OpenCL

Investigate OpenCL Use Streamline in Open CL Timeline Mode to investigate Open CL applications. Visualise which kernel is running on the GPU, for how long and any dependencies which will cause kernels to stall.

image

How to display OpenCL. My MaliG610 cannot be displayed on the DS5.

ysh329 commented 3 weeks ago

Perhaps your phone did not have permission, and I am not sure about that.

Ybuzhenzhuo commented 3 weeks ago

Perhaps your phone did not have permission, and I am not sure about that.

Excuse me, my development platform is not a mobile phone, it is ARM. So, may I ask if a configuration file with the program name. config is required to enable OpenCl mode on the phone after setting it up.

ysh329 commented 2 weeks ago

Perhaps your phone did not have permission, and I am not sure about that.

Excuse me, my development platform is not a mobile phone, it is ARM. So, may I ask if a configuration file with the program name. config is required to enable OpenCl mode on the phone after setting it up.

Ybuzhenzhuo commented 2 weeks ago

Perhaps your phone did not have permission, and I am not sure about that.

Excuse me, my development platform is not a mobile phone, it is ARM. So, may I ask if a configuration file with the program name. config is required to enable OpenCl mode on the phone after setting it up.

  • Can you use clinfo command on your board? and any information?

Thank you, but the clinfo command may not be available on the arm platform. I think our development environment is different.

Ybuzhenzhuo commented 2 weeks ago

Perhaps your phone did not have permission, and I am not sure about that.

Excuse me, my development platform is not a mobile phone, it is ARM. So, may I ask if a configuration file with the program name. config is required to enable OpenCl mode on the phone after setting it up.

  • Can you use clinfo command on your board? and any information?

My development platform is RK3588, which is usually connected to through a serial port or SHH. The GPU model of RK3588 is Mali-G610, which is similar to the Linux operating system, rather than the Android operating system on the mobile end.

ysh329 commented 2 weeks ago

@Ybuzhenzhuo Can you find any driver about libopencl.so below these directories?

Besides, any information from RK? I think they have same customer issues about this board. If you can connect with RK, may get information as soon as possible.

Ybuzhenzhuo commented 2 weeks ago

@Ybuzhenzhuo Can you find any driver about libopencl.so below these directories?

Besides, any information from RK? I think they have same customer issues about this board. If you can connect with RK, may get information as soon as possible.

Excuse me, thank you for your suggestion. I am currently learning about using perf. And prepare to send an email to inquire with RK official.