inducer / loopy

A code generator for array-based code on CPUs and GPUs
http://mathema.tician.de/software/loopy
MIT License
580 stars 70 forks source link

Type inference failure #481

Open xywei opened 3 years ago

xywei commented 3 years ago

Type inference on the following kernel fails with message UnboundLocalError: local variable 'symbols_with_unavailable_types' referenced before assignment:

import pyopencl as cl
import numpy as np
import loopy as lp

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

prog = lp.make_kernel(
        "{[i]: 0<=i<3}",
        """
        <> x[i] = if(i==0, 1, if(i==1, 1.1, if(i==2, 1.11, -1))) {id=insn1}
        x[i] = if(x[i]>1.1, 2, x[i]) {id=insn2,dep=insn1,dup=i}
        out[i] = x[i] {dep=insn2,dup=i}
        """)
evt, (out, ) = prog(queue)

The kernel compiles if the conditional in insn2 is not an expression of x[i], e.g.,

x[i] = if(1<0, 2.5, x[i]) {id=insn2,dep=insn1,dup=i}

This issue was revealed in volumential's table building https://gitlab.tiker.net/xywei/volumential/-/jobs/305193.

inducer commented 3 years ago

Thanks for reporting this! #483 fixes the unassigned variable (which is definitely a bug). ed5d1458abb07f7d30de4854b1e4f427480e52df was an attempt to make type inference succeed, but given that insn1 is self-referential, I don't know that it should succeed... so I gave up on that approach.

@kaushikcfd, any thoughts on this?

kaushikcfd commented 3 years ago

We could hack so that the type-inference doesn't traverse instructions with circular dependencies like insn2. Here's the patch for it, which infers the types correctly. (I guess it would be better if I open a PR for it)

diff --git a/loopy/type_inference.py b/loopy/type_inference.py
index 867f6893..2c496913 100644
--- a/loopy/type_inference.py
+++ b/loopy/type_inference.py
@@ -689,6 +689,7 @@ def _infer_var_type(kernel, var_name, type_inf_mapper, subst_expander):
     import loopy as lp

     type_inf_mapper = type_inf_mapper.copy()
+    had_raised_dep_failure = False

     for writer_insn_id in kernel.writer_map().get(var_name, []):
         writer_insn = kernel.id_to_insn[writer_insn_id]
@@ -698,32 +699,36 @@ def _infer_var_type(kernel, var_name, type_inf_mapper, subst_expander):
         expr = subst_expander(writer_insn.expression)

         debug("             via expr %s", expr)
-        if isinstance(writer_insn, lp.Assignment):
-            result = type_inf_mapper(expr, return_dtype_set=True)
-        elif isinstance(writer_insn, lp.CallInstruction):
-            return_dtype_sets = type_inf_mapper(expr, return_tuple=True,
-                    return_dtype_set=True)
-
-            result = []
-            for return_dtype_set in return_dtype_sets:
-                result_i = None
-                found = False
-                for assignee, comp_dtype_set in zip(
-                        writer_insn.assignee_var_names(), return_dtype_set):
-                    if assignee == var_name:
-                        found = True
-                        result_i = comp_dtype_set
-                        break
-
-                assert found
-                if result_i is not None:
-                    result.append(result_i)
-
-        debug("             result: %s", result)
+        try:
+            if isinstance(writer_insn, lp.Assignment):
+                result = type_inf_mapper(expr, return_dtype_set=True)
+            elif isinstance(writer_insn, lp.CallInstruction):
+                return_dtype_sets = type_inf_mapper(expr, return_tuple=True,
+                        return_dtype_set=True)

-        dtype_sets.append(result)
+                result = []
+                for return_dtype_set in return_dtype_sets:
+                    result_i = None
+                    found = False
+                    for assignee, comp_dtype_set in zip(
+                            writer_insn.assignee_var_names(), return_dtype_set):
+                        if assignee == var_name:
+                            found = True
+                            result_i = comp_dtype_set
+                            break
+
+                    assert found
+                    if result_i is not None:
+                        result.append(result_i)
+            debug("             result: %s", result)
+
+            dtype_sets.append(result)
+        except DependencyTypeInferenceFailure:
+            had_raised_dep_failure = True

     if not dtype_sets:
+        if had_raised_dep_failure:
+            raise DependencyTypeInferenceFailure
         return (
                 None, type_inf_mapper.symbols_with_unknown_types, None,
                 type_inf_mapper.clbl_inf_ctx)
inducer commented 3 years ago

That seems similar to ed5d1458abb07f7d30de4854b1e4f427480e52df in its approach, however I'm not sure either approach necessarily yields a consistent treatment of self-referential assignments. Just skipping them basically ignores type constraints that arise from those. I'm not sure that's what we want. Upon further thought, ed5d1458abb07f7d30de4854b1e4f427480e52df might actually be OK: It considers non-self-referential assignments and tests them for consistency using all other assignments (in a second round), including the self-referential ones. Maybe that's what we should do. I'm not sure I had fully grasped the importance of the "second round" performed by the code when I dismissed the approach.

kaushikcfd commented 3 years ago

Hadn't looked at https://github.com/inducer/loopy/commit/ed5d1458abb07f7d30de4854b1e4f427480e52df, that LGTM. On trying it with small kernels with reduction-like instruction pairs, it yielded the correct results (due to the second round).