oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
720 stars 114 forks source link

Fix double dereference in the function `__work_group_reduce_kernel` #1751

Closed SergeyKopienko closed 1 month ago

SergeyKopienko commented 1 month ago

In this PR I propose the way how to avoid double dereference in Kernel code inside function __work_group_reduce_kernel . Double dereference has place here in Kernel's code in cases when we pass into this function simple pointer : we pass it by reference.

This problem also demonstrated in example, prepared by @julianmi : https://godbolt.org/z/rP7WbW94o

template <typename _Res>
void __kernel1(const _Res __res_acc){
    //  mov rax, qword ptr [rbp - 8]
    // mov dword ptr [rax], 0
    *__res_acc = 0;
}

template <typename _Res>
void __kernel2(const _Res& __res_acc){
    // mov rax, qword ptr [rbp - 8]
    // mov rax, qword ptr [rax]
    // mov dword ptr [rax], 1                  <<< EXTRA INSTRUCTION due we pass pointer by reference
    *__res_acc = 1;
}

template <typename _Res>
void __kernel3(const _Res __res_acc){
    // mov rax, qword ptr [rbp - 8]
    // mov dword ptr [rax], 2
    __res_acc[0] = 2;
}

template <typename _Res>
void __kernel4(const _Res& __res_acc){
    // mov rax, qword ptr [rbp - 8]
    // mov rax, qword ptr [rax]
    // mov dword ptr [rax], 3                     <<< EXTRA INSTRUCTION  due we pass pointer by reference
    __res_acc[0] = 3;
}
SergeyKopienko commented 1 month ago

Yes, exactly: we save only one asm instruction in Kernel code per work-item, but I believe it make sense.