Open ysh329 opened 3 years ago
Investigate OpenCL Use Streamline in Open CL Timeline Mode to investigate Open CL applications. Visualise which kernel is running on the GPU, for how long and any dependencies which will cause kernels to stall.
To help you interpret the charts in Streamline, see detailed descriptions of all the performance counters available for each Mali GPU.
Arm Mali GPUs implement a comprehensive range of performance counters that enable you to closely monitor GPU activity as your application runs. Arm Streamline visualizes performance counter activity in a series of charts, to help you identify the cause of heavy rendering loads or workload inefficiencies that cause poor GPU performance.
Understanding the workload breakdown, pipeline loading, and execution characteristics of your application can help you decide where to apply rendering optimizations. The following guides describe all the performance counters available for each Mali GPU, with some advice about how to interpret the value.
See the different features and capabilities of Arm Mali GPUs from the Midgard-based Mali-T720, to the Valhall-based Mali-G78.
This reference sheet covers from the Midgard Mali-T720, to Valhall GPUs, up to Mali-G78.
The API Support, Core Features and Microarchitecture.
Features tables cover which GPUs support which technologies. For more on given technologies see links below.
The Core Config table details the specs of the chips, rather than just whether features are available. As such for each GPU it has threads in a warp, total threads, and operations/texels etc per clock cycle, as well as cache sizes. Note that for tile write rate on Arm chips this is both fragments written into the tile and the pixels written back out of the tile. Thread count is the total shader core hardware capacity; note that for OpenGL ES only 128 threads are exposed.
For Texturing, to work out cycles/sample for more complicated filters than bilinear, simply apply the multiplications in the tables on top of the bilinear performance to combine to the required filter. For example, a simple trilinear will be 2 x 1 cycles/sample on a Mali-G72, and 2 x 0.25 cycles/sample on a Mali-G77. To add in 4x anisotropic filtering, multiply by a further 4x. Note that anisotropic filter scaling is the worst-case number, it will usually be less than this.
Finally, the architecture-specific tables give thread counts and registers for the chips. For more on the generations of Arm architectures see links below. For a general picture of Arm architectures see: https://developer.arm.com/architectures/media-architectures/gpu-architecture
请问一下,这个 Open CL Timeline Mode 是要ARM提供的DDK工具编出的OpenCL库才支持吗?这个DDK是得到ARM授权的厂商才有的吗?
My GPU is Mali-610, and I have updated the GPU driver using the streamline plugin in DS5. Unable to display openCL due to gatord's config being set. Which Mali supports displaying opencl. Is it Mali-G710?
OpenCL
Investigate OpenCL Use Streamline in Open CL Timeline Mode to investigate Open CL applications. Visualise which kernel is running on the GPU, for how long and any dependencies which will cause kernels to stall.
How to display OpenCL. My MaliG610 cannot be displayed on the DS5.
Perhaps your phone did not have permission, and I am not sure about that.
Perhaps your phone did not have permission, and I am not sure about that.
Excuse me, my development platform is not a mobile phone, it is ARM. So, may I ask if a configuration file with the program name. config is required to enable OpenCl mode on the phone after setting it up.
Perhaps your phone did not have permission, and I am not sure about that.
Excuse me, my development platform is not a mobile phone, it is ARM. So, may I ask if a configuration file with the program name. config is required to enable OpenCl mode on the phone after setting it up.
clinfo
command on your board? and any information?Perhaps your phone did not have permission, and I am not sure about that.
Excuse me, my development platform is not a mobile phone, it is ARM. So, may I ask if a configuration file with the program name. config is required to enable OpenCl mode on the phone after setting it up.
- Can you use
clinfo
command on your board? and any information?
Thank you, but the clinfo command may not be available on the arm platform. I think our development environment is different.
Perhaps your phone did not have permission, and I am not sure about that.
Excuse me, my development platform is not a mobile phone, it is ARM. So, may I ask if a configuration file with the program name. config is required to enable OpenCl mode on the phone after setting it up.
- Can you use
clinfo
command on your board? and any information?
My development platform is RK3588, which is usually connected to through a serial port or SHH. The GPU model of RK3588 is Mali-G610, which is similar to the Linux operating system, rather than the Android operating system on the mobile end.
@Ybuzhenzhuo Can you find any driver about libopencl.so below these directories?
Besides, any information from RK? I think they have same customer issues about this board. If you can connect with RK, may get information as soon as possible.
@Ybuzhenzhuo Can you find any driver about libopencl.so below these directories?
Besides, any information from RK? I think they have same customer issues about this board. If you can connect with RK, may get information as soon as possible.
Excuse me, thank you for your suggestion. I am currently learning about using perf. And prepare to send an email to inquire with RK official.
也可以申请试用(Try For Free),本调研集中在Mali GPU的OpenCL性能获取。
Introduction to Streamline
Streamline helps you optimize software for devices that use Arm processors.
Evaluate where the software in your system spends most of its time by capturing a performance profile of your application running on a target device. Quickly determine whether your performance bottleneck relates to the CPU processing or GPU rendering using interactive charts and comprehensive data visualizations.
For CPU bottlenecks, use the native profiling functionality to locate specific problem areas in your application code. Investigate how processes, threads, and functions behave, from high-level views, right down to line-by-line source code analysis. The basic profile is based on regular sampling of the PC (Program Counter) of the running threads, allowing identification of the hotspots in the running application. Hardware performance counters that are provided by the target processors can supplement this analysis. These counters enable hotspot analysis to include knowledge of hardware events such as cache misses and branch mispredictions.
For GPU bottlenecks, use performance data from the Arm Mali GPU driver and hardware performance counters to explore the rendering workload efficiency. Visualize the workload breakdown, pipeline loading, and execution characteristics to quickly identify where to apply rendering optimizations.
With Streamline, you can: