spcl / dace

DaCe - Data Centric Parallel Programming
http://dace.is/fast
BSD 3-Clause "New" or "Revised" License
491 stars 124 forks source link

Codegen: indexing aliases of multidimensional memlets generates std::tuples #1192

Open petiaccja opened 1 year ago

petiaccja commented 1 year ago

Consider the following tasklet:

# Memlets:
#   Input: values_full (entire slice of a 2D array)
#   Output: result_item (single element of a 2D array)

values_alias = values_full
result_item = values_alias[1, 2]

The generated C++ code is the following:

auto values_alias = values_full;
result_item = values_alias[std::make_tuple(1, 2)];

However, what's expected is that DaCe generates the linear index:

auto values_alias = values_full;
result_item = values_alias[1 * stride0 + 2 * stride1];

When the memlet values_full is accessed directly, the correct linear index expression is generated.

Possible solutions:

Context: Generating the tasklet's code from an AST/IR (such as gt4py's internal IRs) is sometimes much simpler with little tricks like creating aliases as it can spare the trouble of doing passes on the whole IR. It's up to debate if DaCe should take this kind of workload instead of the users, but even if not, DaCe should at least reject the code before codegen.

alexnick83 commented 1 year ago

It is not allowed to slice an array inside a Tasklet. Instead, the slicing should be done in the Memlet. For example, using the SDFG API:

import dace

sdfg = dace.SDFG('test')
sdfg.add_array('values_full', (10, 10), dace.int32)
sdfg.add_array('result_item', (10, 10), dace.int32)

state = sdfg.add_state('test', is_start_state=True)
t = dace.nodes.Tasklet('mytasklet', {'__inp'}, {'__out'}, "__out = __inp")
inp_acc = dace.nodes.AccessNode('values_full')
state.add_edge(inp_acc, None, t, '__inp', dace.Memlet(data='values_full', subset='1, 2'))
out_acc = dace.nodes.AccessNode('result_item')
state.add_edge(t, '__out', out_acc, None, dace.Memlet(data='result_item', subset='0, 1'))

sdfg.view()