inducer / loopy

A code generator for array-based code on CPUs and GPUs
http://mathema.tician.de/software/loopy
MIT License
565 stars 71 forks source link

Loopy's codegenerator fails with non-deterministic error #421

Open kaushikcfd opened 3 years ago

kaushikcfd commented 3 years ago

For the quite simple kernel

import loopy as lp
knl = lp.make_kernel(
    ["{ [i_outer, i_inner] : 4i_outer <= n - 5 and 0 <= i_inner < 4 and 0 <= i_inner + 4i_outer < n }",
     "{ [slab_i_outer] : n - 4 <= 4slab_i_outer < n }",
     "[slab_i_outer, n] -> { [slab_i_inner] : 0 <= slab_i_inner < 4 and 0 <= slab_i_inner + 4slab_i_outer <  n }"],
    """
    a[i_inner + 4*i_outer] = i_inner + 4*i_outer
    a[slab_i_inner + 4*slab_i_outer] = slab_i_inner + 4*slab_i_outer
    """, seq_dependencies=True)

knl = lp.tag_inames(knl, "i_outer:g.0, slab_i_outer:g.0")

lp.generate_code_v2(knl)

Depending on PYTHONHASHSEED the error cycles between:

kaushikcfd commented 3 years ago

I just ran this through the code-generator in #372 and that treats it correctly. At least from my end, debugging this on the current trunk seems unnecessary.

inducer commented 3 years ago

With bd1ea33 (#422) and lpbug.py the updated reproducer above:

PYTHONHASHSEED=22 pycl lpbug.py  

I get

-------------------------------------------------------------------------------
CODE:
-------------------------------------------------------------------------------
__kernel void __attribute__ ((reqd_work_group_size(1, 1, 1))) loopy_kernel(__global int *__restrict__ a, int const n)
{
  for (int i_inner = 0; i_inner <= 3; ++i_inner)
    a[4 * gid(0) + i_inner] = i_inner + 4 * gid(0);
  if (-1 + -4 * (gid(0) + -1 + n + -1 * loopy_floor_div_pos_b_int32(3 * n, 4)) + n >= 0)
    for (int slab_i_inner = 0; slab_i_inner <= -1 + n + -4 * (gid(0) + -1 + n + -1 * loopy_floor_div_pos_b_int32(3 * n, 4)); ++slab_i_inner)
      a[4 * (gid(0) + -1 + n + -1 * loopy_floor_div_pos_b_int32(3 * n, 4)) + slab_i_inner] = slab_i_inner + 4 * (gid(0) + -1 + n + -1 * loopy_floor_div_pos_b_int32(3 * n, 4));
}

-------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/andreas/tmp/lpbug.py", line 13, in <module>
    lp.generate_code_v2(knl)
  File "/home/andreas/src/loopy/loopy/codegen/__init__.py", line 777, in generate_code_v2
    cgr = generate_code_for_a_single_kernel(program[func_id],
  File "/home/andreas/src/loopy/loopy/codegen/__init__.py", line 557, in generate_code_for_a_single_kernel
    assert check_implemented_domains(kernel, codegen_result.implemented_domains,
  File "/home/andreas/src/loopy/loopy/check.py", line 1542, in check_implemented_domains
    raise LoopyError("sanity check failed--implemented and desired "
loopy.diagnostic.LoopyError: sanity check failed--implemented and desired domain for instruction 'insn_0' do not match

implemented: [n] -> { [slab_i_outer, slab_i_inner] : n >= 5 and 4slab_i_outer >= -4 + n and 0 <= slab_i_inner < n - 4slab_i_outer and 2*floor((-n)/4) <= -3 - slab_i_outer }

desired:[n] -> { [slab_i_outer, slab_i_inner] : 4slab_i_outer >= -4 + n and slab_i_inner >= 0 and -4slab_i_outer <= slab_i_inner < n - 4slab_i_outer }

sample point in desired but not implemented: slab_i_inner=0, slab_i_outer=0, n=4
gist of constraints in desired but not implemented: [n] -> { [slab_i_outer, slab_i_inner] : n >= 5 }
inducer commented 3 years ago

I guess I agree with you. Given that this code is getting replaced, let's not worry about it.