oneapi-src / level-zero

oneAPI Level Zero Specification Headers and Loader
https://spec.oneapi.com/versions/latest/elements/l0/source/index.html
MIT License
208 stars 90 forks source link

[Question] Shared and Host Buffers can offer the same overall performance on Intel Integrated Graphics? #92

Open jjfumero opened 2 years ago

jjfumero commented 2 years ago

I am interested in analyzing the overall performance (end-to-end applications) when using different types of buffer allocation. I wrote this blog-entry for reference:

https://jjfumero.github.io/posts/2022/05/overall-performance-of-unified-shared-memory-level-zero/

What I saw was that running an application with host buffers offers the same performance as running with shared memory buffers. My understanding is that, when running applications using shared memory buffers, the GPU driver can migrate the buffers from the host to the device, while host memory will be accessed from the device every time a data item is required. I have two scenarios: a) memory-bound and b) compute-bound. I was surprised to see that, when running the memory-bound case, the overall performance was very similar when allocating buffers using host memory only, and shared memory only. Is this performance expected when running on Intel Integrated graphics?

If you want to reproduce all numbers, the whole application is available here: https://github.com/jjfumero/codeBlogArticles/tree/master/may2022/sharedMemoryEffect

jandres742 commented 2 years ago

thanks @jjfumero . You are correct. However, it would all depend on what the test does and the underlying support. Since this is more of a question of how this is implemented in the L0 driver, rather than how it is defined in the spec, would you mind moving this issue to the driver implementation repo for Intel GPUs https://github.com/intel/compute-runtime ?