intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.11k stars 229 forks source link

clSharedMemAllocINTEL and clMemBlockingFreeINTEL have higher overhead on windows than linux #743

Closed xiang1guo closed 1 week ago

xiang1guo commented 2 weeks ago

Hi, team

Recently, my program showed that Windows has a high host overhead against Linux. I used cliloader to profile the cl calls and finally found that these 2 functions(clSharedMemAllocINTEL and clMemBlockingFreeINTEL) cost too much time on Windows than on Linux.

windows driver version: 31.0.101.5592

My question is: Is it expected? Would you happen to have any idea about this? Thanks! Below are simple test results that called these 2 functions 100 times. (The machine is a little bit different between Windows and Linux, but I think it shouldn't introduce so much difference in performance)

test Windows(ms) Linux(ms)
malloc/free 190.725 0.832464
smorek-intel commented 1 week ago

Hi @xiang1guo Linux i915 KMD has an optimization to reuse freed gfx allocations. This is not the case on Windows. Best approach here is to reduce amount of allocations by user-mode memory pool.

xiang1guo commented 1 week ago

Hi @xiang1guo Linux i915 KMD has an optimization to reuse freed gfx allocations. This is not the case on Windows. Best approach here is to reduce amount of allocations by user-mode memory pool.

Thanks Morek, thanks for the reply and the information you provide. I finally understand the root-cause of the regression on Windows compared to Linux. I am going to close the issue and thanks again for your support!