wangzy0327 commented 1 month ago

Summary

could not create a primitive descriptor for a reorder primitive unimplemented

src/common/reorder.cpp reorder_primitive_desc_create function

...
for (auto r = engine->get_reorder_implementation_list(src_md, dst_md); *r;
            ++r) {
        reorder_pd_t *reorder_pd = nullptr;
        if ((*r)(&reorder_pd, engine, attr, src_engine, src_md, dst_engine,
                    dst_md)
                == success) {
            pd.reset(reorder_pd);
            return success;
        }
    }
    return unimplemented;
...

src/gpu/nvidia/cudnn_reorder_impl.cpp

#define REORDER_INSTANCE(...) \
    impl_list_item_t( \
            impl_list_item_t::reorder_type_deduction_helper_t<__VA_ARGS__>()),

constexpr impl_list_item_t cuda_reorder_impl_list[] = {
        REORDER_INSTANCE(gpu::ocl::cross_engine_reorder_t::pd_t)
        REORDER_INSTANCE(cudnn_reorder_t::pd_t)
        nullptr,
};

const impl_list_item_t *
cuda_gpu_engine_impl_list_t::get_reorder_implementation_list(
        const memory_desc_t *, const memory_desc_t *) {
    return cuda_reorder_impl_list;
}

Version

onednn v3.2 Report oneDNN version and githash. Version information is printed to stdout in verbose mode.

Environment

Nvidia GPU
OS version ubuntu 20.04
Compiler version (gcc --version)
CMake version (cmake --version)
CMake output log
git hash (git log -1 --format=%H)

Expected behavior

How to solve the reorder primitive not implemented problem ?

@vpirogov

dzarukin commented 1 month ago

@wangzy0327 Two moments: 1) oneDNN version is outdated; 2) there's no reproducer attached to this issue. Please update the version and tracker description accordingly.

wangzy0327 commented 1 month ago

@dzarukin Which oneDNN version is long-term maintained and can support the reorder operator?

dzarukin commented 1 month ago

@dzarukin Which oneDNN version is long-term maintained and can support the reorder operator?

oneDNN supports the last minor version released. At this point it's v3.5. Once switched to this new version, I'll need ONEDNN_VERBOSE=all logs to identify the root cause of the issue you are seeing, and a source code may be needed to identify the error. Thank you.

wangzy0327 commented 1 month ago

I haved compiled oneDNN v3.5 version which you mentioned (6860e98e71 (HEAD -> release-v3.5, tag: v3.5))，but I get the log of cmake as follow.

/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:120:5: error: use of undeclared identifier 'global_ptr'
    global_ptr<uint16_t> gptr_u16(reinterpret_cast<uint16_t *>(ptr));
    ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:120:26: error: use of undeclared identifier 'gptr_u16'
    global_ptr<uint16_t> gptr_u16(reinterpret_cast<uint16_t *>(ptr));
                         ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:131:27: error: use of undeclared identifier 'gptr_u16'
    vec_u16.store(offset, gptr_u16);
                          ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:150:9: error: expected ';' after expression
        CASE(f16);
        ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:141:26: note: expanded from macro 'CASE'
        global_ptr<type> gptr_dt(reinterpret_cast<type *>(ptr)); \
                         ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:150:9: error: use of undeclared identifier 'global_ptr'
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:141:9: note: expanded from macro 'CASE'
        global_ptr<type> gptr_dt(reinterpret_cast<type *>(ptr)); \
        ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:150:9: error: use of undeclared identifier 'gptr_dt'
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:141:26: note: expanded from macro 'CASE'
        global_ptr<type> gptr_dt(reinterpret_cast<type *>(ptr)); \
                         ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:150:9: error: use of undeclared identifier 'gptr_dt'
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:143:29: note: expanded from macro 'CASE'
        vec_dt.load(offset, gptr_dt); \
                            ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:151:9: error: expected ';' after expression
        CASE(f32);
        ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:141:26: note: expanded from macro 'CASE'
        global_ptr<type> gptr_dt(reinterpret_cast<type *>(ptr)); \
                         ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:151:9: error: use of undeclared identifier 'global_ptr'
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:141:9: note: expanded from macro 'CASE'
        global_ptr<type> gptr_dt(reinterpret_cast<type *>(ptr)); \
        ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:151:9: error: use of undeclared identifier 'gptr_dt'
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:141:26: note: expanded from macro 'CASE'
        global_ptr<type> gptr_dt(reinterpret_cast<type *>(ptr)); \
                         ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:151:9: error: use of undeclared identifier 'gptr_dt'
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:143:29: note: expanded from macro 'CASE'
        vec_dt.load(offset, gptr_dt); \
                            ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:152:9: error: expected ';' after expression
        CASE(s32);
        ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:141:26: note: expanded from macro 'CASE'
        global_ptr<type> gptr_dt(reinterpret_cast<type *>(ptr)); \
                         ^
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:152:9: error: use of undeclared identifier 'global_ptr'
/home/wzy/sycl_workspace/oneDNN-v2/src/gpu/sycl/sycl_io_helper.hpp:141:9: note: expanded from macro 'CASE'
        global_ptr<type> gptr_dt(reinterpret_cast<type *>(ptr)); \
        ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]

Is it long-term maintained version? @dzarukin

shu1chen commented 1 month ago

@wangzy0327 Which compiler are you working on? Since you're building oneDNN based on SYCL runtime with NVIDIA GPU support, did you use the oneAPI DPC++ Compiler with support for CUDA or oneAPI for NVIDIA GPUs? Also, please note that there are some limitations to reorder, as described in NVIDIA backend readme

However, there are some limitations when using SYCL_API-DNN reorder on Nvidia GPU:

Per dimension scaling is not supported (a single alpha and beta value is accepted by the transform tensor function).

Blocking is only permitted for the channel dimension in cuDNN. This primitive currently supports block size of 4.

Blocking is only supported when channel dimension is a multiple of the block size and the datatype is int8.

Forward pass supports f32, f16, bf16 and s8 data types.

Backward pass supports f32 and bf16 data types.

wangzy0327 commented 1 month ago

I am using SYCL compiler(2022-06-release) version

shu1chen commented 1 month ago

I am using SYCL compiler(2022-06-release) version

Please upgrade the compiler version to see if it resolves the build error.

wangzy0327 commented 1 month ago

@shu1chen @dzarukin I update compiler version with 2024-WW14 and oneDNN version with v3.5 I tryed to compile /home/wzy/sycl_workspace/build-cuda-2024/bin/clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --offload-arch=sm_70 reorder.cpp -o reorder.out -ldnnl -ltbb and run onednn reorder example using reorder.cpp

The output is

wzy@gxnzx1277:~/sycl_workspace/oneDNN-example$ ./reorder.out gpu
onednn_verbose,info,oneDNN v3.5.0 (commit 6860e98e71c748f956150f72cdbe14efe6fc2ac2)
onednn_verbose,info,cpu,runtime:TBB,nthr:32
onednn_verbose,info,cpu,isa:Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
onednn_verbose,info,gpu,runtime:DPC++
onednn_verbose,info,gpu,engine,0,backend:Nvidia,name:Tesla V100-PCIE-32GB,driver_version:0.0.0,binary_kernels:disabled
onednn_verbose,info,gpu,engine,1,backend:Nvidia,name:Tesla V100-PCIE-32GB,driver_version:0.0.0,binary_kernels:disabled
onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,primitive,create:dispatch,reorder,gpu:0,reorder,ocl:cross_engine::any,undef,src_f32::blocked:abcd::f0 dst_s8::blocked:acdb::f0,attr-scales:dst:2:f32,,3x3x227x227,bad engine kind,src/gpu/intel/ocl/cross_engine_reorder.cpp:47
oneDNN error caught: 
        Status: unimplemented
        Message: could not create a primitive descriptor for a reorder primitive
Example failed on GPU.

shu1chen commented 1 month ago

@wangzy0327 That's because this example calls per-channel output scales as a reorder post-ops in reorder.cpp#L94-L98. It is mentioned in the limitations of reorder in NVIDIA backend readme:

Per dimension scaling is not supported (a single alpha and beta value is accepted by the transform tensor function).

If you remove the code blocks in this example that adds this post-ops, the reorder primitive works fine then. I have tested it locally and it worked well for me.

wangzy0327 commented 1 month ago

OK. Thank you for reply to solve my problem.

rupakroyintel commented 1 month ago

@wangzy0327 Can we close this issue?

oneapi-src / oneDNN

could not create a primitive descriptor for a reorder primitive #1996

Summary

Version

Environment

Expected behavior