[Question] Shared and Host Buffers can offer the same overall performance on Intel Integrated Graphics?

I am interested in analyzing the overall performance (end-to-end applications) when using different types of buffer allocation. I wrote this blog-entry for reference:

https://jjfumero.github.io/posts/2022/05/overall-performance-of-unified-shared-memory-level-zero/

What I saw was that running an application with host buffers offers the same performance as running with shared memory buffers. My understanding is that, when running applications using shared memory buffers, the GPU driver can migrate the buffers from the host to the device, while host memory will be accessed from the device every time a data item is required. I have two scenarios: a) memory-bound and b) compute-bound. I was surprised to see that, when running the memory-bound case, the overall performance was very similar when allocating buffers using host memory only, and shared memory only. Is this performance expected when running on Intel Integrated graphics?

If you want to reproduce all numbers, the whole application is available here: https://github.com/jjfumero/codeBlogArticles/tree/master/may2022/sharedMemoryEffect

oneapi-src / level-zero

[Question] Shared and Host Buffers can offer the same overall performance on Intel Integrated Graphics? #92