I am not sure whether this is an issue or me misusing the program. Please close if it isn't relevant.
I am using a Radeon RX 580 in a Sonnet Thunderbolt 3 chassis. When I build clpeak with the AMD APP SDK (which is very difficult to install because the installer seems to crash a lot) I get Transfer Bandwidth of ~8.5GB/s. This is far more than available over Thunderbolt 3.
Eventually I found the AMD OpenCL optimisation guide here (http://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401315_92101), section "OpenCL Memory Object Properties". This states that if you create a buffer with CL_MEM_ALLOC_HOST_PTR is is created as "Pinned host memory shared by all devices in context (unless only device in context is CPU; then, host memory)" rather than "Device memory". So the data is only going to local memory. If I removed the CL_MEM_ALLOC_HOST_PTR option on the buffer creation call then I got a more realistic value.
I am not sure whether this is an issue or me misusing the program. Please close if it isn't relevant.
I am using a Radeon RX 580 in a Sonnet Thunderbolt 3 chassis. When I build clpeak with the AMD APP SDK (which is very difficult to install because the installer seems to crash a lot) I get Transfer Bandwidth of ~8.5GB/s. This is far more than available over Thunderbolt 3.
Eventually I found the AMD OpenCL optimisation guide here (http://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401315_92101), section "OpenCL Memory Object Properties". This states that if you create a buffer with CL_MEM_ALLOC_HOST_PTR is is created as "Pinned host memory shared by all devices in context (unless only device in context is CPU; then, host memory)" rather than "Device memory". So the data is only going to local memory. If I removed the CL_MEM_ALLOC_HOST_PTR option on the buffer creation call then I got a more realistic value.