Closed ssanchez11 closed 4 years ago
This change is necessary likely for other hardware platforms and even host only implementations. Linux kernels in general have moved to using zero pages for uninitialized memory. Previous kernels (i.e. back in the 2.6 world), would initialize memory upon reading, i.e. page in REAL unique pages. I don't know when this change took place exactly, however, now any copy from non-initialized memory will likely not bring in memory and result in a highly CPU optimized read operation from a single page that gets fully cached in L1 cache. This provides very unrealistic results shown above. This is not just an OpenCL issue, but any copy from any region within Linux that is not initialized. As such ALL benchmarks providing bandwidth need to ensure that any READ operation is from fully initialized memory pages.
See this page for background: https://lwn.net/Articles/340370/
Krishnaraj,
Could you please provide feedback or approve this merge request?
Thank you,
Sebastian
Yea. That makes sense. Thanks
…ment
When a host buffer is passed as a source into enqueueWriteBuffer(), a memcpy() is used by OpenCL. memcpy() is optimized to copy zero pages. Newly allocated memory points to zero pages, and when the memory is written to, physical memory is allocated.
Therefore, initialize host buffer to obtain accurate measurements with enqueueWriteBuffer().
Results on Intel hardware: