inducer / loopy

A code generator for array-based code on CPUs and GPUs
http://mathema.tician.de/software/loopy
MIT License
576 stars 70 forks source link

Code generation with fully unrolled loops gets extremely slow #207

Open Jonas164 opened 3 years ago

Jonas164 commented 3 years ago

We have this simple kernel:

domains = ["{ [i,j,l] : 0 <= i< m and 0 <= j < k and 0<= l < n }"]
instructions = """
                C[i,j] = C[i, j] + A[i,l] * B[l,j]
        """
assumptions = "m>0 and n>0 and k>0 and m mod {0} = 0 and n mod {0} = 0 and k mod {0} = 0".format(
            MAT_DIM)
outer_knl = lp.make_kernel(domains, instructions,
                                   target=lp.CFamilyTarget(), assumptions=assumptions,name=name)
outer_knl = lp.add_and_infer_dtypes(
outer_knl, {"A,B,C": np.float64, "m,n,k": outer_knl.index_dtype})

Simply generating Code and header like this

code = lp.generate_code_v2(outer_knl)
header = str(lp.generate_header(outer_knl,code)[0])

with no transformations is unproblematic and fast.

Fully unrolling all of these loops gets extremely slow with even small loop sizes. Adding these transformations:

 outer_knl = lp.split_iname(outer_knl, "i", MAT_DIM)
 outer_knl = lp.split_iname(outer_knl, "j", MAT_DIM)
 outer_knl = lp.split_iname(outer_knl, "l", MAT_DIM)
 outer_knl = lp.tag_inames(outer_knl, dict(i_inner="unr"))
 outer_knl = lp.tag_inames(outer_knl, dict(j_inner="unr"))
 outer_knl = lp.tag_inames(outer_knl, dict(l_inner="unr"))
 outer_knl = lp.tag_inames(outer_knl, dict(i_outer="unr"))
 outer_knl = lp.tag_inames(outer_knl, dict(j_outer="unr"))
 outer_knl = lp.tag_inames(outer_knl, dict(l_outer="unr"))
 outer_knl = lp.add_prefetch(outer_knl, "A[:,l]", default_tag="l.auto")
 outer_knl = lp.add_prefetch(outer_knl, "B[l,:]", default_tag="l.auto")
 outer_knl = lp.tag_inames(outer_knl, dict(A_dim_0="unr"))
 outer_knl = lp.tag_inames(outer_knl, dict(B_dim_1="unr"))
 outer_knl = lp.fix_parameters(outer_knl, m=MAT_DIM, n=MAT_DIM, k=MAT_DIM)

Running this with MAT_DIM = 100 already takes over an hour on a reasonably fast CPU (Ryzen 5 3600). Is a fully unrolled program not intended or is there some way to speed this up?

inducer commented 3 years ago

Wouldnt' that end up with 100^3, so something like a million C statements?

Jonas164 commented 3 years ago

100^4 C Statements, correct.

I already use a custom generator for a fixed algorithm that has no problem generating this many LoC. But its obviously less flexible than loopy.

So it is simply not intended?

inducer commented 3 years ago

I'm not massively surprised that it takes a little while. If you post a profile (using e.g. vmprof), maybe it's obvious where it's spending time?

Jonas164 commented 3 years ago

I used cProfiler as I had trouble installing vmprof.

This is what I got:

Fri Jan 29 19:16:40 2021    profile_out

         753662205 function calls (676937418 primitive calls) in 2935.483 seconds

   Ordered by: internal time
   List reduced from 3181 to 100 due to restriction <100>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
105813222/63884676  911.405    0.000  967.626    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:849(wrapper)
  5323855  397.692    0.000  715.870    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:678(_number_to_expr_like)
 12222461  174.678    0.000  185.012    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:599(pwaff_get_pieces)
  3079618  133.903    0.000  346.076    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:1088(_align_dim_type)
  1930652  128.621    0.000  205.690    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/symbolic.py:1346(aff_to_expr)
  7251609   97.026    0.000   97.026    0.000 {built-in method islpy._isl.zero_on_domain}
  5323154   73.137    0.000  609.752    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:710(expr_like_add)
  1744203   69.710    0.000  153.765    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/symbolic.py:1411(__init__)
  7609709   49.076    0.000  100.710    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:1084(_set_dim_id)
  6884102   44.784    0.000   45.627    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:193(obj_bogus_init)
  5301561   44.586    0.000   44.586    0.000 {built-in method islpy._isl.from_aff}
  5945706   44.073    0.000   44.073    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:793(val_to_python)
  2738800   34.830    0.000  221.444    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:734(expr_like_mul)
18661434/1257021   30.966    0.000 1807.949    0.001 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pymbolic/mapper/__init__.py:114(__call__)
142637898   30.899    0.000   30.901    0.000 {built-in method builtins.isinstance}
  1742500   27.137    0.000 1673.860    0.001 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/symbolic.py:1582(simplify_using_aff)
  6312410   24.572    0.000  130.039    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:303(space_get_var_dict)
  1744400   22.218    0.000  202.665    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:967(obj_project_out_except)
  2964053   21.125    0.000   47.208    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:838(wrapper)
  1743903   18.600    0.000 1155.180    0.001 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/symbolic.py:1492(with_aff_conversion_guard)
  1744103   18.391    0.000   18.391    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:1320(__enter__)
  1486809   17.352    0.000   17.352    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:536(set_get_basic_sets)
  1744203   16.635    0.000 1092.875    0.001 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/symbolic.py:1473(aff_from_expr)
  1087356   16.479    0.000   27.121    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:1205(<listcomp>)
  1087356   15.315    0.000   28.166    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:1200(<listcomp>)
5728907/5719806   14.811    0.000  393.179    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pymbolic/mapper/__init__.py:174(map_foreign)
  1744103   13.782    0.000   13.782    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:1324(__exit__)
  1861054   13.150    0.000   13.150    0.000 {built-in method islpy._isl.from_basic_set}
1714433/1543869   12.211    0.000  516.205    0.000 {built-in method _functools.reduce}
   697000   12.048    0.000 1763.509    0.003 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/target/c/codegen/expression.py:175(map_subscript)
  1744203   11.733    0.000   11.733    0.000 {built-in method islpy._isl.from_space}
  1744203   11.234    0.000 1049.356    0.001 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/symbolic.py:1488(pwaff_from_expr)
   360702   10.489    0.000   15.888    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:215(generic_str)
1369500/1369400   10.354    0.000  820.614    0.001 {built-in method builtins.sum}
   547258    9.994    0.000    9.995    0.000 {method 'remove' of 'list' objects}
   697000    9.222    0.000   25.121    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/kernel/array.py:1231(get_access_info)
  1087356    8.256    0.000  422.420    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:1169(align_spaces)
 12222461    8.070    0.000   10.335    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:606(append_tuple)
10042792/10042624    7.535    0.000   11.358    0.000 <frozen importlib._bootstrap>:1033(_handle_fromlist)
  2399600    7.525    0.000  373.166    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/symbolic.py:1429(map_constant)
   182250    6.739    0.000  118.434    0.001 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/isl_helpers.py:120(iname_rel_aff)
   181950    6.556    0.000  156.680    0.001 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/codegen/__init__.py:297(fix)
 21838731    6.366    0.000    6.366    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:312(set_dim_name)
 25605855    6.272    0.000    6.842    0.000 {built-in method builtins.getattr}
  6884860    6.156    0.000    6.156    0.000 {built-in method __new__ of type object at 0x55c30cfdf6c0}
  1742500    6.057    0.000 1174.311    0.001 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/target/c/codegen/expression.py:196(<genexpr>)
       50    6.003    0.120  465.466    9.309 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/check.py:1182(check_implemented_domains)
  1823620    5.746    0.000   10.872    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pytools/__init__.py:330(__init__)
365256/365252    5.679    0.000    8.790    0.000 {built-in method builtins.__build_class__}
8146895/7950322    5.496    0.000  264.048    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pytools/__init__.py:671(wrapper)
  6884102    5.125    0.000   13.175    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:178(obj_new)
      300    5.020    0.017   11.644    0.039 {built-in method _pickle.dump}
  7609709    4.348    0.000   22.590    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:1059(_back_to_basic)
5080873/4365227    3.994    0.000   23.319    0.000 {built-in method builtins.hash}
  1747362    3.864    0.000   18.877    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pytools/__init__.py:629(_deco)
   176800    3.862    0.000  155.913    0.001 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/codegen/instruction.py:34(to_codegen_result)
182100/50    3.835    0.000 2461.955   49.239 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/codegen/control.py:218(build_loop_nest)
   182100    3.778    0.000  156.559    0.001 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/codegen/bounds.py:30(get_approximate_convex_bounds_checks)
   182200    3.774    0.000   20.982    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/codegen/bounds.py:58(get_usable_inames_for_conditional)
1369400/1369300    3.569    0.000  521.552    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pymbolic/mapper/evaluator.py:94(map_product)
4172100/353800    3.552    0.000 1795.863    0.005 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/target/c/codegen/expression.py:110(rec)
   538404    3.510    0.000    3.510    0.000 {built-in method islpy._isl.to_str}
 5300/150    3.488    0.001 2459.210   16.395 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/codegen/loop.py:121(generate_unroll_loop)
364200/50    3.183    0.000 2459.852   49.197 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/codegen/control.py:329(build_insn_group)
  1823620    3.180    0.000   14.052    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pytools/__init__.py:413(__init__)
  1389100    3.179    0.000    3.599    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/target/c/codegen/expression.py:383(map_constant)
   521550    3.144    0.000    6.156    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/type_inference.py:285(map_variable)
1369400/1369300    3.059    0.000  516.333    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pytools/__init__.py:1052(product)
 17544853    3.048    0.000    3.048    0.000 {method 'append' of 'list' objects}
5409029/5054447    3.043    0.000    4.209    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pymbolic/primitives.py:537(__hash__)
   176800    2.861    0.000 1805.092    0.010 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/target/c/__init__.py:825(emit_assignment)
   360702    2.785    0.000   23.204    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/kernel/tools.py:315(op)
4108200/4107900    2.752    0.000  280.224    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pymbolic/mapper/evaluator.py:96(<genexpr>)
4108500/4108200    2.705    0.000  578.317    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pymbolic/mapper/evaluator.py:92(<genexpr>)
 10065100    2.680    0.000    2.680    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:983(<genexpr>)
11209337/10173125    2.652    0.000    3.622    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pymbolic/primitives.py:1627(is_nonzero)
8400424/7365614    2.633    0.000    5.494    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pymbolic/primitives.py:1634(is_zero)
   527805    2.593    0.000    7.040    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pymbolic/primitives.py:1539(flattened_product)
 12782578    2.452    0.000    2.452    0.000 {built-in method builtins.hasattr}
   187600    2.432    0.000   13.413    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/codegen/result.py:203(merge_codegen_results)
  1651900    2.396    0.000    4.984    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/kernel/__init__.py:750(iname_tags_of_type)
   176800    2.339    0.000 1810.589    0.010 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/codegen/instruction.py:102(generate_assignment_instruction_code)
   364050    2.235    0.000    2.801    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/codegen/__init__.py:229(copy)
  2103359    2.230    0.000    4.788    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/kernel/__init__.py:674(get_inames_domain)
  7494635    2.168    0.000    2.168    0.000 {built-in method builtins.setattr}
   177300    2.108    0.000    4.095    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:197(generic_reduce)
  1270600    2.099    0.000    3.087    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/schedule/__init__.py:133(generate_sub_sched_items)
   182150    2.066    0.000  254.731    0.001 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/codegen/control.py:315(__call__)
527705/525555    2.028    0.000  572.316    0.001 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pymbolic/mapper/__init__.py:403(map_product)
   697000    2.025    0.000    2.841    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/target/c/codegen/expression.py:85(find_array)
   176800    1.993    0.000 1969.379    0.011 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/codegen/instruction.py:74(generate_instruction_code)
  2097554    1.954    0.000    4.895    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pymbolic/primitives.py:282(__rmul__)
1731952/875510    1.933    0.000    4.773    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pymbolic/primitives.py:565(is_equal)
   183753    1.921    0.000    1.921    0.000 {built-in method islpy._isl.equality_from_aff}
527717/351813    1.841    0.000 1167.295    0.003 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/pymbolic/mapper/__init__.py:398(map_sum)
   182200    1.834    0.000    2.978    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/schedule/__init__.py:166(find_active_inames_at)
13871427/13870975    1.830    0.000    1.830    0.000 {built-in method builtins.len}
   182100    1.813    0.000    1.813    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/islpy/__init__.py:516(basic_obj_get_constraints)
344150/171900    1.791    0.000   16.467    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/type_inference.py:170(map_sum)
   344150    1.772    0.000    5.469    0.000 /home/jonas/miniforge3/envs/dev/lib/python3.9/site-packages/loopy/type_inference.py:99(combine)

Hard to interpret for me to be honest. Lots of islpy calls. Does that help you @inducer ?

dokempf commented 3 years ago

There are graphical tools to dynamically explore and interpret these profiles. I personally use SnakeViz for cProfiles, but I am sure there are many alternatives.