oneapi-src / oneDNN

oneAPI Deep Neural Network Library (oneDNN)
https://uxlfoundation.org
Apache License 2.0
3.58k stars 985 forks source link

How to transpose a tensor #1420

Closed StrongerXi closed 2 years ago

StrongerXi commented 2 years ago

Context

I'm trying to use OneDNN to implement arbitrary axis permutation for a tensor, e.g.,

t = [[0, 1], [2, 3]]
t.transpose(); // == t.transpose({1, 0})

And I want to move the underlying data, so not just a "transpose view", i.e.,

[0, 1, 2, 3]
-->
[0, 2, 1, 3]

What I've tried

Note that with the example I could've just used explicit memory format tags, i.e., ab --> ba, but I'm trying to implement a generic transpose, and apparently not all format tags are supported, e.g., dcba. So I'm using strides to construct the memory descriptor (with a helper dims_to_row_major_strides).

std::vector<int> data {0, 1, 2, 3, 4, 5};
memory::dims dims {2, 3};
memory::dims new_axes {1, 0};

auto src_md = memory::desc(src_dims, dt::f32, dims_to_row_major_strides(dims));
auto dst_md = src_md.permute_axes(new_axes);

auto src_mem = memory(src_md, engine);
auto dst_mem = memory(dst_md, engine);

auto reorder_pd = reorder::primitive_desc(engine, src_md, engine, dst_md); // error, since dims don't match

To enforce same src/dst dims for the reorder primitive, I tried the following, but I get a new error from reshape, because reshape seems to require sequential data here.

auto dst_md = src_md.permute_axes(new_axes).reshape(dims);

Question

I feel like I'm missing something very simple here. Could you guys provide some pointers?

cc: @jacobkahn

dzarukin commented 2 years ago

Hi @StrongerXi, thank you for the question. I'm not sure what exactly you are trying to achieve. Consider a 2x3 tensor with [0][1][2] [3][4][5] elements with ab format (strides are 3x1). Reorder of this tensor to ba will make data look like this: [0][3][1] [4][2][5]. Dims are 2x3, strides are 1x2. Reshape of this tensor to 3x2 will make data look like this: [0][1] [2][3] [4][5]. Dims are 3x2, strides are 2x1.

So, what "arbitrary axis permutation for a tensor" means in these terms? Thank you.

densamoilov commented 2 years ago

It looks looks the goal is to have a mechanism that would allow to create a memory descriptor for arbitrary strided tensor and reorder a memory accordingly.

StrongerXi commented 2 years ago

Hi @dzarukin and @densamoilov, thanks for your prompt responses.


What @densamoilov suggested is indeed the optimal solution, as it covers both this "transpose" needs and more ops I'll end up implementing.


To answer @dzarukin 's specific question:

Reorder of this tensor to ba will make data look like this: [0][3][1] [4][2][5]. Dims are 2x3, strides are 1x2.

For my example, I'd like to have this data layout, but with Dims being 3x2, strides being 2x1.

To clarify "arbitrary axis permutation for a tensor", I'm really referring to behavior of numpy.transpose, e.g.,:

>>> x = np.random.rand(2, 3, 4)
>>> x.shape
(2, 3, 4)
>>> x.transpose(1, 2, 0).shape
(3, 4, 2)

The permute_axes method is very close to what I need, but it's a view -- to move the underlying data, I need reorder, which has strict requirements on src/dst memory descriptors (which is shown in my original example).

StrongerXi commented 2 years ago

Also, #110 is a very similar issue, but I couldn't find what I need there (or maybe I'm failing to see it).

dzarukin commented 2 years ago

Alright, I guess I see now. So, with current API available I can imagine only the following flow:

  1. Take a source abc-like tensor and transpose data to a destination (temporary) memory with desired format.
  2. Take a source memory descriptor and reshape it to a desired transposed one (based on dims, strides will be updated automatically).
  3. Create a new memory object with transposed memory descriptor and attach a handle to it from a reorder destination memory.

Not sure about object management though. It might be required to accept a memory from user, reorder it to a temporary memory and then copy it to the one user provided. So that it is still under user's responsibility to keep it alive as long as needed.

StrongerXi commented 2 years ago

To confirm my understanding, are these correct?

(a). Reorder primitive will be used in (1), i.e., to reorder from ab to ba. But dimensions are not "transposed" yet, because reorder requires the same src/dst dims, i.e., 2 x 3. (b). (2) and (3) are meant to combine the dst memory from (a) with a new descriptor with correct dims (i.e., 2 x 3 to 3 x 2).

dzarukin commented 2 years ago

Yes, that's the best what I can imagine to make it the desired way. Please let me know if it works for you. Thanks.

StrongerXi commented 2 years ago

Think I'll go with this, only with a few complications which I'll explain below:

// I realized a primitive version of "memory::permute_axes" is precisely what I want here, hence the name
memory permute_axes(engine eng, memory src, memory::dims new_axes) {

  auto src_md = src.get_desc();
  auto dst_md = src.reshape(get_transposed_shape(src_md, new_axes));
  auto dst_mem = memory(dst_md, eng);

  auto reorder_dst_md = get_reorder_dst_md(src_md, new_axes); // this is the tricky part
  auto reorder_pd = reorder::primitive_desc(eng, src_md, eng, reorder_dst_md);
  auto reorder_prim = reorder(reorder_pd);

  // ... execution, not interesting here

  return dst_mem;
}
  1. memory::reshape seems to require a row-major tensor, could you confirm?

  2. As commented in the code, get_reorder_dst_md is the tricky part here, because (a). we can't get format_tag from a memory (it gets translated into strides after construction) and (b). we don't have tags for all formats, e.g., dcba; thus we can't have an easy ab -> ba like translation here.

So I think I need to construct memory descriptor with strides here, and I'm trying to figure out an algorithm to compute the strides for the reorder_dst_md. For instance, for my original example, it should be {1, 2}.

Please let me know if this looks right to you, thanks!

dzarukin commented 2 years ago

Hi @StrongerXi, sorry for a delay. It seems to me you pointed to the right function - permute_axes. I think this is the general flow you might want to consider (unless you've already come up with something that works...):

Could you check, if it works for you, please? Thank you.

StrongerXi commented 2 years ago

Hi @dzarukin. Thanks for the response. Could you elaborate a bit on orig_md_abx_format and transposed_md_abx_format, i.e., how exactly do we get them from orig_md and transposed_md?

dzarukin commented 2 years ago

I believe it should be like this: auto orig_md_abx_format = memory::desc(orig_md.dims(), orig_md.data_type(), memory::format_tag::abcd); Though number of dimensions in a memory tag should coincide with ndims. You might want to have a helper function for that that takes ndims and returns memory::format_tag. Thank you.

StrongerXi commented 2 years ago

Hmm this doesn't seem to be working, or maybe I'm doing sth wrong, let me look more and get back to you:).

vpirogov commented 2 years ago

Closing as stale, feel free to reopen with new data.

WilliamTambellini commented 1 year ago

cf https://github.com/apache/mxnet/blob/master/src/operator/nn/dnnl/dnnl_transpose.cc

feixuedudiao commented 6 months ago

@dzarukin can you give a example for transpose? i don't understand too. thks.

dzarukin commented 6 months ago

Hi @feixuedudiao,

@dzarukin can you give a example for transpose? i don't understand too. thks.

This is the reorder example. Hope it will guide you how to perform a transposition.

feixuedudiao commented 6 months ago

thanks

feixuedudiao commented 4 months ago

I implement two way, the code is above: memory transpose(engine eng, memory in_mem, std::vector axis) { auto out_md = in_mem.get_desc(); auto out_new_md = out_md.permute_axes(axis); auto out_new_hd = in_mem.get_data_handle(); auto out_new_mem = memory(out_new_md, eng, out_new_hd); return out_new_mem; }

void transpose_with_chn(engine eng, memory in_mem, memory::format_tag tag, std::vector &net, std::vector<std::unordered_map<int, memory>> & net_args) { auto in_md = in_mem.get_desc(); auto dims = in_md.get_dims(); auto out_md = memory::desc(dims, in_md.get_data_type(), tag); auto out_mem = memory({out_md}, eng);

// Create primitive descriptor.
auto reorder_pd = reorder::primitive_desc(eng, in_md, eng, out_md);
// Create the primitive.
auto reorder_pm = reorder(reorder_pd);

net.push_back(reorder_pm);
net_args.push_back({{DNNL_ARG_SRC, in_mem},
                    {DNNL_ARG_DST, out_mem}});

}