spcl / dace

DaCe - Data Centric Parallel Programming
http://dace.is/fast
BSD 3-Clause "New" or "Revised" License
497 stars 129 forks source link

dace codegen issues #1391

Closed rohanrayan closed 1 year ago

rohanrayan commented 1 year ago

Describe the bug There are a few issues I have about the C++ generated code for a simple matrix addition example from dace.

  1. .dacecache/madd/src/cpu/madd.cpp: in the dace_init_madd() function: we have a pointer allocation with new but the lines of code following the allocation check for the value in the result integer. Why is this the case? As I understand it, the new operator with nothrow returns a nullptr, in which case the check should be to see if the pointer (__state) is null or it should be a try_catch block to check for bad_alloc. What am I missing here?

  2. .dacecache/madd/src/cpu/madd.cpp: in the __program_madd_internal() function: In this case I see that the loop induction termination is set to M and N, but the internal matrix access is specialized to 10 (A[10*i + j] for example). Why is this the case? Why cant the matrix access also be based on the inputs (M, N)?

  3. .dacecache/madd/sample/madd_main.cpp: in the main() function: I see that the int M, N are set to 42, but the calloc() is specialized correctly to 10x10. Where does the 42 come from? Why is this not also 10?

To Reproduce Steps to reproduce the behavior:

  1. Run the attached python script to generate the SDFG along with the generated C++ code.
  2. Check the .dacecache/madd/src/cpu/madd.cpp and .dacecache/madd/sample/madd_main.cpp for the described issues

Version I am using dace v0.14.4

Additional context Script to reproduce behavior along with the generated code on my machine. code_gen.zip

alexnick83 commented 1 year ago
  1. There are two points here to comment on: a. The "default" DaCe code generator doesn't output code for catching allocation failures. b. As implied by the above, __result doesn't check if the allocation of __state succeeded. However, DaCe code generators can implement custom initialization routines (which may have allocation checks) that return a boolean to indicate success or failure. You can see how the code is exactly generated here. Note that int __result = 0 and if (__result) ... are always printed, regardless if there any targets (code generators) with their own initialization routines.
  2. This is a case where the DaCe program's input/output data are type annotated (therefore, you can do ahead-of-time (AoT) compilation *), but the actual arguments are also passed to the to_sdfg method (needed for just-in-time (JIT) compilation). The matrix sizes are specialized because the JIT mode has higher priority than AoT and DaCe will use the sizes of the actual arguments instead of the type annotations. The loop bounds are not specialized because this is unrelated to the JIT mode, which only creates type annotations for the program's inputs/outputs. Since you passed values for M and N in the to_sdfg method, you probably expected all instances of M and N in the program to be replaced accordingly. @philip-paul-mueller this looks pretty related to #1380.
  3. The subfolder sample is sample C++ code that one can use to call the generated DLL/shared library. Its purpose is to provide the user with the correct signature. The symbolic values in the main.cpp file have no meaning whatsoever other than being syntactically correct. The arrays are allocated with the "correct" size because it has been hardcoded to the SDFG due to using the JIT mode. The number 42 has no special meaning, except being the answer to life, the universe, and everything.

* The syntax for the type annotations is slightly wrong. You should write A: dace.float64[M, N] instead of A=dace.float64[M, N].

rohanrayan commented 1 year ago

Hi @alexnick83

Thanks for the comments.

I just have one follow up on point 2, I tried both the JIT version and AOT versions and it seems like the deciding factor in the generated code is whether I set the arguments as A: dace.float64[M,N] or A= dace.float64[M,N]. In my case regardless of JIT or AOT, if I set the code to the latter, I always get 10 hardcoded inside the loop, and I never see this when I set the arguments to the former, it always correctly generates code without specialization. I guess the takeaway for me is I should always use A: dace.float64[M, N] form of argument declaration in dace.

Regarding the other points, it makes sense after you explain it, but it is confusing for beginners who may not be aware of such design decisions.

alexnick83 commented 1 year ago

Please note that standard Python syntax applies. To use AoT, you need to provide type annotations. A = dace.float64[M, N] is not a type annotation but a default value (which cannot work because dace.float64[M, N] is a shortcut to dace.data.Array(dtype=dace.float64, shape=(M, N)); not a value).

rohanrayan commented 1 year ago

Since all the questions have been answered, I am closing this issue.