Open kaushikcfd opened 2 years ago
@inducer: I think this is unavoidable for the kernels that @sv2518 is working with. Since changing the logic for emitting predicates in loopy.codegen.instruction
would be too intrusive, I propose that loop distribution would give us an easy fallback as follows:
tunit = lp.make_kernel(
"{[i, j]: 0<=i<100 and 0<=j<4}",
"""
for i
for j
<> tmp1[j] = i+j
<> tmp2[j] = 0
if j>2
tmp2[j] = 2 * tmp1[j] {id=w_tmp2}
end
out[i, j] = 2*tmp2[j]
end
end
""", seq_dependencies=True)
tunit = lp.tag_array_axes(tunit, "tmp1,tmp2", "vec")
tunit = lp.tag_inames(tunit, "j:vec")
# {{{ fallback -->
knl = lp.distribute_loops(tunit.default_entrypoint,
"id:w_tmp2",
outer_inames=frozenset("i"))
renamed_j, = knl.id_to_insn["w_tmp2"].within_inames - {"i"}
knl = lp.untag_inames(knl, renamed_j, VectorizeTag)
knl = lp.tag_inames(knl, {renamed_j: "unr"})
# }}}
tunit = tunit.with_kernel(knl)
print(lp.generate_code_v2(tunit).device_code())
With the potential use-case for loop distribution, any opinions on (me) moving forward with it?
@sv2518: If you want to access this functionality, please checkout the branch cvec_x_distribute_loops
.
Thanks!
cvec_x_distribute_loops
is not updated to the new version of the c_vecextensions_target
, is it?
cvec_x_distribute_loops is not updated to the new version of the c_vecextensions_target, is it?
Oops, yep. Messed up while cherry-picking the commits. Pushed a fix.
Thanks!
Is the fallback code meant to go in our codebase or can this be done in loopy?
I added this snippet to PyOP2
all_insn_cinsn = list(insn for insn in wrapper.default_entrypoint.instructions if isinstance(insn, lp.CInstruction))
# {{{ fallback -->
for insn in all_insn_cinsn:
wrapper = lp.distribute_loops(wrapper.default_entrypoint,
insn.id,
outer_inames=shifted_iname)
renamed_j, = wrapper.id_to_insn[insn.id].within_inames - shifted_iname
wrapper = lp.untag_inames(wrapper, renamed_j, VectorizeTag)
wrapper = lp.tag_inames(wrapper, {renamed_j: "unr"})
# }}}
but I run into the following error. Did I drive this wrong?
../PyOP2/pyop2/global_kernel.py:501: in vectorise
wrapper = lp.distribute_loops(wrapper.default_entrypoint,
../loopy/loopy/transform/iname.py:1179: in wrapper
transformed_kernel = transformation_func(kernel, *args, **kwargs)
../loopy/loopy/transform/loop_distribution.py:337: in distribute_loops
within = match.parse_match(insn_match)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
expr = 'mtf__cond'
def parse_match(expr):
"""Syntax examples::
* ``id:yoink and writes:a_temp``
* ``id:yoink and (not writes:a_temp or tag:input)``
"""
if not expr:
return All()
def parse_terminal(pstate):
next_tag = pstate.next_tag()
if next_tag is _id:
result = Id(pstate.next_match_obj().group(1))
pstate.advance()
return result
elif next_tag is _tag:
result = Tagged(pstate.next_match_obj().group(1))
pstate.advance()
return result
elif next_tag is _writes:
result = Writes(pstate.next_match_obj().group(1))
pstate.advance()
return result
elif next_tag is _reads:
result = Reads(pstate.next_match_obj().group(1))
pstate.advance()
return result
elif next_tag is _in_kernel:
result = InKernel(pstate.next_match_obj().group(1))
pstate.advance()
return result
elif next_tag is _iname:
result = Iname(pstate.next_match_obj().group(1))
pstate.advance()
return result
else:
pstate.expected("terminal")
def inner_parse(pstate, min_precedence=0):
pstate.expect_not_end()
if pstate.is_next(_not):
pstate.advance()
left_query = Not(inner_parse(pstate, _PREC_NOT))
elif pstate.is_next(_openpar):
pstate.advance()
left_query = inner_parse(pstate)
pstate.expect(_closepar)
pstate.advance()
else:
left_query = parse_terminal(pstate)
did_something = True
while did_something:
did_something = False
if pstate.is_at_end():
return left_query
next_tag = pstate.next_tag()
if next_tag is _and and _PREC_AND > min_precedence:
pstate.advance()
left_query = And(
(left_query, inner_parse(pstate, _PREC_AND)))
did_something = True
elif next_tag is _or and _PREC_OR > min_precedence:
pstate.advance()
left_query = Or(
(left_query, inner_parse(pstate, _PREC_OR)))
did_something = True
return left_query
if isinstance(expr, MatchExpressionBase):
return expr
from pytools.lex import LexIterator, lex, InvalidTokenError
try:
pstate = LexIterator(
[(tag, s, idx, matchobj)
for (tag, s, idx, matchobj) in lex(_LEX_TABLE, expr,
match_objects=True)
if tag is not _whitespace], expr)
except InvalidTokenError as e:
from loopy.diagnostic import LoopyError
> raise LoopyError(
"invalid match expression: '{match_expr}' ({err_type}: {err_str})"
.format(
match_expr=expr,
err_type=type(e).__name__,
err_str=str(e)))
E loopy.diagnostic.LoopyError: invalid match expression: 'mtf__cond' (InvalidTokenError: at index 0: ...mtf__cond...)
../loopy/loopy/match.py:403: LoopyError
I think this is unavoidable
I'm not sure I agree. I think we can expect the predicate to be an expression, and thus we can determine that there's a dependency on that iname in the predicates. Based on that, we just need to raise Unvectorizable
somewhere in the target.
we just need to raise Unvectorizable somewhere in the target.
We did that in https://github.com/inducer/loopy/pull/617. The issue has to do more with the way predicates are emitted in the codegen pipeline. It needs to be taught about UnvectorizableErrors.
Did I drive this wrong?
I think there is a minor error there. I think it should be like:
from loopy.math import Id, Or
cinsn_ids = [cinsn.id
for cinsn in kernel.instructions
if (isinstance(cinsn, lp.CInstruction) and cinsn.predicates)]
cinsn_match = Or(tuple(Id(cinsn_id) for cinsn_id in cinsns_ids))
outer_inames = frozenset([shifted_iname+"_outer"])
kernel = lp.distribute_loops(kernel,
cinsn_match,
outer_inames=outer_inames)
inames_to_untag = [kernel.id_to_insn[cinsn_id].within_inames - outer_inames
for cinsn_id in cinsn_ids]
kernel = lp.untag_inames(kernel, inames_to_untag, VectorizeTag)
kernel = lp.tag_inames(kernel, {iname_to_untag: "unr"
for iname_to_untag in inames_to_untag})
Ah yes I got further now, but now I error with
../PyOP2/pyop2/global_kernel.py:502: in vectorise
kernel = lp.distribute_loops(kernel,
../loopy/loopy/transform/iname.py:1179: in wrapper
transformed_kernel = transformation_func(kernel, *args, **kwargs)
../loopy/loopy/transform/loop_distribution.py:400: in distribute_loops
if not _is_loop_distribution_sound(kernel,
../loopy/loopy/transform/loop_distribution.py:133: in _is_loop_distribution_sound
amaps_pred = [
../loopy/loopy/transform/loop_distribution.py:134: in <listcomp>
process_amap(get_insn_access_map(kernel, insn_id, var))
../loopy/loopy/kernel/tools.py:2166: in get_insn_access_map
indices = list(_IndexCollector(var)((insn.expression,
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:244: in __call__
result = super().rec(expr, *args, **kwargs)
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:151: in __call__
return self.map_foreign(expr, *args, **kwargs)
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:206: in map_foreign
return self.map_tuple(expr, *args, **kwargs)
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:359: in map_list
return self.combine(self.rec(child, *args, **kwargs) for child in expr)
../loopy/loopy/kernel/tools.py:2127: in combine
return reduce(operator.or_, values, frozenset())
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:359: in <genexpr>
return self.combine(self.rec(child, *args, **kwargs) for child in expr)
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:244: in __call__
result = super().rec(expr, *args, **kwargs)
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:153: in __call__
return method(expr, *args, **kwargs)
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:314: in map_quotient
self.rec(expr.numerator, *args, **kwargs),
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:244: in __call__
result = super().rec(expr, *args, **kwargs)
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:153: in __call__
return method(expr, *args, **kwargs)
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:281: in map_call
(self.rec(expr.function, *args, **kwargs),)
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:244: in __call__
result = super().rec(expr, *args, **kwargs)
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:148: in __call__
return self.handle_unsupported_expression(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <loopy.kernel.tools._IndexCollector object at 0x15075e700>
expr = ResolvedFunction(Variable('sqrt')), args = (), kwargs = {}
def handle_unsupported_expression(self, expr, *args, **kwargs):
"""Mapper method that is invoked for
:class:`pymbolic.primitives.Expression` subclasses for which a mapper
method does not exist in this mapper.
"""
> raise UnsupportedExpressionError(
"{} cannot handle expressions of type {}".format(
type(self), type(expr)))
E pymbolic.mapper.UnsupportedExpressionError: <class 'loopy.kernel.tools._IndexCollector'> cannot handle expressions of type <class 'loopy.symbolic.ResolvedFunction'>
../../lib/python3.9/site-packages/pymbolic/mapper/__init__.py:122: UnsupportedExpressionError
The CInstruction
does not contain an sqrt
function, but the instruction it depends on does.
The CInstruction does not contain an sqrt function, but the instruction it depends on does.
It needed a map_resolved_function
. Pushed a fix to the branch.
Okay, cool, thanks! Now one step further I run into
../PyOP2/pyop2/global_kernel.py:502: in vectorise
kernel = lp.distribute_loops(kernel,
../loopy/loopy/transform/iname.py:1179: in wrapper
transformed_kernel = transformation_func(kernel, *args, **kwargs)
../loopy/loopy/transform/loop_distribution.py:400: in distribute_loops
if not _is_loop_distribution_sound(kernel,
../loopy/loopy/transform/loop_distribution.py:138: in _is_loop_distribution_sound
amaps_succ = [
../loopy/loopy/transform/loop_distribution.py:139: in <listcomp>
process_amap(get_insn_access_map(kernel, insn_id, var))
../loopy/loopy/kernel/tools.py:2167: in get_insn_access_map
indices = list(_IndexCollector(var)((insn.expression,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = CInstruction(groups=frozenset(), code='break;', read_variables=frozenset(), tags=frozenset(), within_inames_is_final=T...tf__atol_crit_id'}), no_sync_with=frozenset(), within_inames=frozenset({'mtf__i_c', 'n_shift_outer', 'n_shift_batch'}))
name = 'expression'
def __getattr__(self, name):
# This method is implemented to avoid pylint 'no-member' errors for
# attribute access.
> raise AttributeError(
"'{}' object has no attribute '{}'".format(
self.__class__.__name__, name))
E AttributeError: 'CInstruction' object has no attribute 'expression'
../../lib/python3.9/site-packages/pytools/__init__.py:449: AttributeError
Sorry, hadn't accounted for CInstruction. Pushed a fix that runs the following snippet as expected:
import loopy as lp
from loopy.symbolic import parse
tunit = lp.make_kernel(
"{[i]: 0<=i<4}",
["<> tmp[i] = 0 {id=w_tmp}",
lp.CInstruction(iname_exprs=("i", "i"),
code="break;",
predicates={parse("tmp[i] > n")},
read_variables={"i", "n"},
depends_on=frozenset({"w_tmp"}),
id="break",),
lp.Assignment("out_callee",
"i",
depends_on=frozenset(["break"]))
],
[lp.ValueArg("n", dtype="int32"), ...],
name="circuit_breaker")
knl = tunit.default_entrypoint
knl = lp.tag_inames(knl, "i:vec")
knl = lp.distribute_loops(knl, "id:break", frozenset())
knl = lp.untag_inames(knl, "i_1", lp.VectorizeTag)
knl = lp.tag_inames(knl, "i_1:unr")
tunit = tunit.with_kernel(knl)
print(lp.generate_code_v2(knl).device_code())
Fixed as part of https://github.com/inducer/loopy/pull/557
The following kernel --
generates:
Notice the stray
(j)
in the conditional. A short term solution which is not vectorizing such instructions will be included as a part of #557./cc @sv2518