te42kyfo / gpu-benches

collection of benchmarks to measure basic GPU capabilities
GNU General Public License v3.0
172 stars 27 forks source link

can not run on gfx1100 --> rx7900 #4

Open chanzhennan opened 10 months ago

chanzhennan commented 10 months ago

can pass in gpu-cache, gpu-metrics, gpu-stream,gpu-strides failed in gpu-l2-cache ,gpu-latency test

/opt/rocm/bin/hipcc -std=c++20 -I/opt/rocm/include/rocprofiler/ -I/opt/rocm/hsa/include/hsa -L/opt/rocm/rocprofiler/lib -lrocprofiler64 -lrocprofiler64v2 -lhsa-runtime64 -lrocm_smi64 -ldl main.hip -o demo ./demo gpu_count 1 Agent 0 data set exec time spread Eff. bw gpu_count 1 Agent 0 measureMetricStop: no kernel kaunch was intercepted make: *** [Makefile:25: test] Segmentation fault (core dumped)

te42kyfo commented 10 months ago

Thank you for your feedback. I have so far only tried gfx90a and gfx1030 targets, as these are the ones I have available.

The error is somewhere in the performance counter collection, which is of course highly device specific. This data is not necessary for the benchmark, it just provides some further insight. A quick fix is the removal of all lines where it says "meausreXXXBytesStart/Stop".

I will try to add a metric measurement flag to the code to skip this functionality which is only really tested for a few devices.

By the way, I would be very much interested in the results that you get.

te42kyfo commented 10 months ago

I tested this on a machine with a RX6900XT. When I use your build command line, it fails for me with the same error. If I uses the one from the Makefile, it works. Note the difference:

Yours:
hipcc -std=c++20 -I/opt/rocm/include/rocprofiler/ -I/opt/rocm/hsa/include/hsa -L/opt/rocm/rocprofiler/lib -lrocprofiler64 -lrocprofiler64v2 -lhsa-runtime64 -lrocm_smi64 -ldl main.hip -o demo
Makefile:
hipcc -std=c++20 -I/opt/rocm/include/rocprofiler/ -I/opt/rocm/hsa/include/hsa -L/opt/rocm/rocprofiler/lib -lrocprofiler64 -lhsa-runtime64 -ldl -o hip-l2-cache main.hip

There is an additional "-lrocprofiler64v2" in your command line. Removing it made it work for me. It might still be though, that some of the metric names are different for gfx1100 and that it still wont work.

te42kyfo commented 10 months ago

Can you please verify that this actually fixes your problem? Also, like I have said, I would be interested in your results.