oneapi-src / oneDNN

oneAPI Deep Neural Network Library (oneDNN)
https://uxlfoundation.org
Apache License 2.0
3.59k stars 990 forks source link

Working with the Reorder #1980

Closed CodeCrafter18 closed 1 month ago

CodeCrafter18 commented 3 months ago

Hello! I have a question regarding the usage of reorders. For example, let's consider ref_sum.hpp.

if (need_output_reorder()) {
    CHECK(reorder_primitive_desc_create(
        reorder_pds_[n_], engine, dst_acc_md(), dst_md()));
}

In this case, the following function will be called:

status_t reorder_primitive_desc_create(std::shared_ptr<primitive_desc_t> &pd,
        engine_t *engine, const memory_desc_t *src_md,
        const memory_desc_t *dst_md, const primitive_attr_t *attr) {
    return reorder_primitive_desc_create(
            pd, engine, src_md, engine, dst_md, engine, attr);
}

However, in this scenario, create_resource(...) is never called. If I want to write my own Reorder that overrides the create_resource(...) function, as it's done in ACL, then I'll get a runtime error when I call ctx.get_resource_mapper()...

How can I work around this problem if my implementation uses resource_mapper?

Also, I have a question about why the Sum, Reorder, and Concat operations have separate logic, for example, for obtaining the list of implementations?​​​​​​​​​​​​​​​​

Thanks.

densamoilov commented 3 months ago

Hi @CodeCrafter18,

First of all, primitive descriptor has nothing to do with create_resource(...) and resource_mapper. It is the primitive that utilizes those. The primitive_t::create_resource is always called at the primitive initialization stage. So you just need to override create_resource(...) in your primitive and then use the resource_mapper from the execution context during the execution step.

CodeCrafter18 commented 3 months ago

Hi @densamoilov,

As I pointed out above in a specific example (ref_sum). The “create” function will be called, but the overridden “create_resource” is not. As a result, the “resource_mapper” is empty at the execution stage. One way to get around this is to create an instance again in "execute" in the case of an “empty” for “resource_mapper”, but this looks like some kind of “crutch”.

densamoilov commented 3 months ago

@CodeCrafter18, in the example above you mentioned primitive descriptor and not the primitive, that's why I'm still confused. One thing to keep in mind is that create_resource is ALWAYS called from the main primitive. If the main primitive has nested primitives then it is the main primitive's responsibility to call create_resource from the nested primitives. Another thing to keep in mind is that the resource mapper in the execution context object (exec_ctx_t) that is passed to the execute function of the main primitive should be passed down to the nested primitives. In order to do that you need to create a new execution context for the nested primitives using the execution context for the main primitive: https://github.com/oneapi-src/oneDNN/blob/8d457a8ce898a032183ace8c541a70252351d8e2/src/gpu/generic/ref_sum.hpp#L171

CodeCrafter18 commented 3 months ago

Let's do it again.

We are looking at a specific example: (oneDNN/src/cpu/ref_sum.hpp)

The init(...) function line 55-58:

if (need_output_reorder()) {
    CHECK(reorder_primitive_desc_create(
        reorder_pds_[n_], engine, dst_acc_md(), dst_md()));
}

In this case, the Reorder is selected and created from the general list. My implementation uses create_resource. But in this test, I never call create_resource. Therefore, at the execute stage, I encounter an error.

For regular reorder launches, for example, via benchen, there is no such problem.

densamoilov commented 3 months ago

If you want to create a resource for your reorder implementation that is used in ref_sum.hpp then you will need to add ref_sum_t::create_resource(…) that would call create_resource from each element of the reorders_ vector.

CodeCrafter18 commented 3 months ago

Well, thank you. Why hasn't it been done already? Now the ACL has its own implementation for the Reorder operation and can be called. If it's not difficult, could you provide a modification for test_sum from ctest?

And what about the second question, why is there different logic for Reorder Sum and Concat?

densamoilov commented 3 months ago

Why hasn't it been done already?

It's a bug in the case when ACL reorder is available. The ref_sum_t class needs to have this: https://github.com/oneapi-src/oneDNN/blob/8d457a8ce898a032183ace8c541a70252351d8e2/src/cpu/ref_deconvolution.hpp#L407-L413

+@jondea

CodeCrafter18 commented 3 months ago

It looks like simple_layer_normalization has the same problem.

jondea commented 3 months ago

I've made an issue to look into this. We also have work in progress to remove the use for create_resource for the ACL implementations, which I think should fix this issue.

theComputeKid commented 1 month ago

Closed by https://github.com/oneapi-src/oneDNN/pull/2064