Open kaushikcfd opened 3 years ago
I just ran this through the code-generator in #372 and that treats it correctly. At least from my end, debugging this on the current trunk seems unnecessary.
With bd1ea33 (#422) and lpbug.py
the updated reproducer above:
PYTHONHASHSEED=22 pycl lpbug.py
I get
-------------------------------------------------------------------------------
CODE:
-------------------------------------------------------------------------------
__kernel void __attribute__ ((reqd_work_group_size(1, 1, 1))) loopy_kernel(__global int *__restrict__ a, int const n)
{
for (int i_inner = 0; i_inner <= 3; ++i_inner)
a[4 * gid(0) + i_inner] = i_inner + 4 * gid(0);
if (-1 + -4 * (gid(0) + -1 + n + -1 * loopy_floor_div_pos_b_int32(3 * n, 4)) + n >= 0)
for (int slab_i_inner = 0; slab_i_inner <= -1 + n + -4 * (gid(0) + -1 + n + -1 * loopy_floor_div_pos_b_int32(3 * n, 4)); ++slab_i_inner)
a[4 * (gid(0) + -1 + n + -1 * loopy_floor_div_pos_b_int32(3 * n, 4)) + slab_i_inner] = slab_i_inner + 4 * (gid(0) + -1 + n + -1 * loopy_floor_div_pos_b_int32(3 * n, 4));
}
-------------------------------------------------------------------------------
Traceback (most recent call last):
File "/home/andreas/tmp/lpbug.py", line 13, in <module>
lp.generate_code_v2(knl)
File "/home/andreas/src/loopy/loopy/codegen/__init__.py", line 777, in generate_code_v2
cgr = generate_code_for_a_single_kernel(program[func_id],
File "/home/andreas/src/loopy/loopy/codegen/__init__.py", line 557, in generate_code_for_a_single_kernel
assert check_implemented_domains(kernel, codegen_result.implemented_domains,
File "/home/andreas/src/loopy/loopy/check.py", line 1542, in check_implemented_domains
raise LoopyError("sanity check failed--implemented and desired "
loopy.diagnostic.LoopyError: sanity check failed--implemented and desired domain for instruction 'insn_0' do not match
implemented: [n] -> { [slab_i_outer, slab_i_inner] : n >= 5 and 4slab_i_outer >= -4 + n and 0 <= slab_i_inner < n - 4slab_i_outer and 2*floor((-n)/4) <= -3 - slab_i_outer }
desired:[n] -> { [slab_i_outer, slab_i_inner] : 4slab_i_outer >= -4 + n and slab_i_inner >= 0 and -4slab_i_outer <= slab_i_inner < n - 4slab_i_outer }
sample point in desired but not implemented: slab_i_inner=0, slab_i_outer=0, n=4
gist of constraints in desired but not implemented: [n] -> { [slab_i_outer, slab_i_inner] : n >= 5 }
I guess I agree with you. Given that this code is getting replaced, let's not worry about it.
For the quite simple kernel
Depending on
PYTHONHASHSEED
the error cycles between: