The bandwidth result is too high.
I review the code, find out that:
"nopinned" option is defaultly true, so CL_MEM_ALLOC_HOST_PTR is used while creat cl buffer.
That will cause an issue: the final time result is not the time that data transfer from host to device, but transfer time from one device memory to another device memory.
that is not pcie bandwidth, but graphic ddr bandwidth.
In OpenCL spec 2.0, use CL_MEM_ALLOC_HOST_PTR will return an buffer, already on device, mapped to host.
In this case, If we want pcie bandwidth, "nopinned" must be set, "bool pinned = false;"will work
src/opencl/level0/BusSpeedDownload.cpp :
bool pinned = !op.getOptionBool("nopinned");
I use OpenCL test mode.:
./configue --with-opencl --without-cuda --prefix=$xxx
make install
./bin/shocdriver -opencl -benchmark BusSpeedDownload
The bandwidth result is too high. I review the code, find out that: "nopinned" option is defaultly true, so CL_MEM_ALLOC_HOST_PTR is used while creat cl buffer.
That will cause an issue: the final time result is not the time that data transfer from host to device, but transfer time from one device memory to another device memory. that is not pcie bandwidth, but graphic ddr bandwidth.
In OpenCL spec 2.0, use CL_MEM_ALLOC_HOST_PTR will return an buffer, already on device, mapped to host.
In this case, If we want pcie bandwidth, "nopinned" must be set, "bool pinned = false;"will work