Glenside should find a blocking strategy for Mobilenet convolutions

gussmith23 commented 4 years ago

This is mostly an issue of egg search performance and the fact that Glenside's blowing up the egraph with rewrites. See comment thread for debugging chain.

[X] Add (failing) test for just one convolutional layer of Mobilenet
[X] Debug why it's not tensorizing
[ ] Add needed rewrites --> Unsure on whether this is actually the issue

gussmith23 commented 4 years ago

I've been working on this, but I'm stuck even on the first convolution. I'm feeling stumped at the moment. Here's what I know:

padding/slicing and slicing/concatting is happening somewhat normally
The weights are getting padded up to 64, and the image data is also getting padded to a multiple of 64
A systolic array is getting mapped in, but not a 64x64 one. a 64x(some large number) one.

The issue seems to be stemming from the fact that the image isn't getting split up into 64x64 chunks?

Searching for this pattern:

(compute dot-product
(access-cartesian-product
(access-pad
(access-pad
     (access-flatten
      (access (access-tensor conv_block_1_conv_weight) 1))
     zero-padding
     1
     0
     37)
     zero-padding
     0
     0
     32)
?0
)
)

Produces a few eclasses for ?0: a slice, and two pads, one of which is shown here:

EClass { id: 55671, nodes: [AccessPad([481, 6, 22, 22, 22])], data: AccessPattern(AccessPatternData { shape: IxDynImpl(Inline(1, [12544, 0, 0, 0])), item_shape: IxDynImpl(Inline(1, [64, 0, 0, 0])), zero_regions: ...

So the shape, 12544x64, is amenable to slicing into 64x64 chunks. And I have, in the rewrites:

        slice_concatenate_accesses(0, SliceConcatenateStrategy::DivideInto { segment_size: 64 }),
        slice_concatenate_accesses(1, SliceConcatenateStrategy::DivideInto { segment_size: 64 }),

This rewrite should fire on this access, should it not? 12544 is divisible by 64.

Need to figure out why it's not. It's definitely the largest access I've tried to split up with this rewrite. Maybe that's coming into play.

gussmith23 commented 4 years ago

I should say that, for ?0, I think I should see an access-concatenate, but I don't.

gussmith23 commented 4 years ago

So it seemed that the slice-concatenate rewrite wasn't firing. Doing some profiling, it seemed like there was a lot of time being spent in the rewrite:

The rewrite is bulky. Specifically, it's bulky when using the DivideInto strategy, as the strategy slices up an access as many times as it can, all at once. I added a new strategy, SliceOnce which just slices an access once. Tentatively, it's at least partly working.

gussmith23 commented 4 years ago

Things are working a bit better, though even after an hour-long run on the RTML server, it still doesn't tensorize. Though, with more time, it always finds more systolic arrays.

gussmith23 commented 4 years ago

Another potential easy route is to profile the rewrites running in egg, and start banning rewrites.

gussmith23 commented 4 years ago

Experimenting with that. Egg actually uses a backoff scheduler by default, and bans active rewrites. I've made sure the important rewrites aren't getting banned.

gussmith23 commented 4 years ago

Just to be clear: so far, my thought has been that all the right rewrites are in place, and it's simply taking a very long time to tensorize because the blocking is insane. I should actually confirm hunches about a few things here, rather than going off assumptions:

[ ] Understand how this will actually be blocked up. If I don't actually know what the blocking looks like, I can't know what to search for in the program.
[ ] Search for the beginnings of tensorization in the program.

gussmith23 commented 4 years ago

For the time being, I'm shelving this in favor of #43. I want Glenside to be able to statically block up computations, but honestly, it's not worth the effort right now.

gussmith23 / glenside

Glenside should find a blocking strategy for Mobilenet convolutions #42