krrishnarraj / clpeak

A tool which profiles OpenCL devices to find their peak capacities
Apache License 2.0
396 stars 111 forks source link

Using External Thunderbolt 3 GPU on Windows has suspiciously fast "Transfer bandwidth" #38

Open hjmallon opened 7 years ago

hjmallon commented 7 years ago

I am not sure whether this is an issue or me misusing the program. Please close if it isn't relevant.

I am using a Radeon RX 580 in a Sonnet Thunderbolt 3 chassis. When I build clpeak with the AMD APP SDK (which is very difficult to install because the installer seems to crash a lot) I get Transfer Bandwidth of ~8.5GB/s. This is far more than available over Thunderbolt 3.

Eventually I found the AMD OpenCL optimisation guide here (http://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401315_92101), section "OpenCL Memory Object Properties". This states that if you create a buffer with CL_MEM_ALLOC_HOST_PTR is is created as "Pinned host memory shared by all devices in context (unless only device in context is CPU; then, host memory)" rather than "Device memory". So the data is only going to local memory. If I removed the CL_MEM_ALLOC_HOST_PTR option on the buffer creation call then I got a more realistic value.

acollaborator commented 1 year ago

I was wondering this too. Thanks for taking the time to figure this out and sharing a solution.