inducer / loopy

A code generator for array-based code on CPUs and GPUs
http://mathema.tician.de/software/loopy
MIT License
576 stars 70 forks source link

How should domain trees be deduced/specified? #127

Open inducer opened 4 years ago

inducer commented 4 years ago

@isuruf showed this kernel, which can be reproduced approximately by running

 SUMPY_NO_CACHE=1 pycl -m pudb test_fmm.py 'test_sumpy_fmm(cl._csc, LaplaceKernel(2), VolumeTaylorLocalExpansion,VolumeTaylorMultipoleExpansion)' 

in https://github.com/inducer/sumpy.

The if is clearly emitted prematurely: It should only cover statements in isrc, but here it seems to cover the accumulator initialization and the result store.

inducer commented 4 years ago

I think I've made some progress here. I've got a workaround and some design questions to make this better going forward.

The domains in this kernel are given as:

            "{[itgt_box]: 0 <= itgt_box < ntgt_boxes}",
            "{[isrc_box]: isrc_box_start <= isrc_box < isrc_box_end}",
            "{[itgt, isrc, idim]: \
                itgt_start <= itgt < itgt_end and \
                isrc_start <= isrc < isrc_end and \
                0 <= idim < dim}",

The above domains are also responsible for the placement of the conditional: As soon as the itgt loop is entered, that last domain gets to enforce its constraints. Since the accumulator is inside itgt, and since that domain (projected onto itgt) is empty if isrc_start == isrc_end, Loopy is in principle correct in emitting the conditional where it does.

So in this case, the kernel is incorrect and should be fixed.

The natural fix (just make separate domains for itgt and isrc):

[ntgt_boxes] -> { [itgt_box] : 0 <= itgt_box < ntgt_boxes }                                                             
  [isrc_box_end, isrc_box_start] -> { [isrc_box] : isrc_box_start <= isrc_box < isrc_box_end }                          
  [itgt_end, itgt_start] -> { [itgt] : itgt_start <= itgt < itgt_end }                                                  
  [isrc_end, isrc_start] -> { [isrc] : isrc_start <= isrc < isrc_end }                                                  
{ [idim, idim_0] : 0 <= idim <= 1 and 0 <= idim_0 <= 1 }                

fails because Loopy gets the domain tree wrong and then complains about needing to branch the domain tree in order to satisfy the inames needed by various statements.

Helping it along by (kind of unnaturally) forcing the right nesting of domains produces the desired kernel without the conditional:

            "{[itgt_box]: 0 <= itgt_box < ntgt_boxes}",
            "{[isrc_box]: isrc_box_start <= isrc_box < isrc_box_end}",
            "[itgt_start,itgt_end,isrc_box] -> {[itgt]:  itgt_start <= itgt < itgt_end}",
            "[isrc_start, isrc_end, itgt] -> {[isrc]: isrc_start <= isrc < isrc_end}",
            "{[idim]: 0 <= idim < dim}"

This works, but it's stupid, which begs the question of what we should do instead. Some thoughts on that follow.

cc'ing the semantics squad @mattwala @jdsteve2 and @kaushikcfd for general Loopy interest.

mattwala commented 4 years ago

Possibly related: https://gitlab.tiker.net/inducer/loopy/-/issues/64