Implement conv2d-im2col-fc

gussmith23 commented 3 years ago

Scott can support an im2col'ed conv2d fully-connected systolic array invocation taking NHWC input. This means we can fold a bunch more access patterns into a hardware op, making us generate less code on our side (and hopefully making things faster by relying on his stuff more).

To be clear, what I mean is this: There are roughly three ways that we've discussed implementing convolutions.

"normal" convolutions. See #45 for a description of these. I have more info in my notes from talking with Scott.
Im2col convolutions, where the data is already in im2col format -- we say that this is a "fully connected" conv2d, because we're computing a conv2d but it looks like a single fully-connected (i.e. matmul/dense) invocation, because of the im2col trick. These are already implemented in Glenside.
Im2col convolutions, where the data is in memory as NHWC and Scott's hardware reads it in as im2col. This is what we want to implement here.

gussmith23 commented 3 years ago

This is interesting for Glenside because layout comes up in a big way.

It's taken me til now to realize that all of our current systolic array usages have seemed "simple" in my mind because (maybe among other things) they're not layout-dependent. That is, when we're using a systolic array just as matrix multiplier, there's a very simple concept of layout, because there's only two dimensions in each input. All of our systolic array usages just use the systolic array as a matrix multiplier -- "FC" (fully connected) mode as Scott likes to say. I've been mentally blocked on how we'll implement systolic arrays in "conv2d" mode (as opposed to FC mode), knowing that it would be more complicated, but unsure as to why. Now, I realize that it's (at least partly) because the systolic array makes assumptions about layout (Scott's prefer NHWC/HWIO), and so, when we create the rewrites to map to these pieces of hardware, we need to be able to detect layout.

As I think about it actively, though, I'm thinking it shouldn't be a problem, or not as big a problem as I thought. It will be kinda hacky, but not too bad.

The core thing to think about is that the Glenside implementation of conv2d already contains implicit layout information. We are doing very specific things with each dimension, and so we can tell the layout based on the computation. In the future, dimensions and layouts will be more explicit (see #6) and we shouldn't need to "infer" layout from computation pattern, as I'm describing here. But for now, we need to match on the computation that we know represents a conv2d in a specific layout. Specifically, we implement conv2ds on NCHW tensors -- NHWC tensors are first transposed to NCHW. So we can match on the NCHW convolution and turn it into a systolic-array-conv2d-nchw, which we don't actually have an implementation for. Then, from there, we can make a whole bunch of rewrites that invoke other systolic arrays:

If we see that the inputs/output are being transposed with a specific pattern (indicating they're coming from/going back to NHWC/HWIO) then we can map in a systolic-array-conv2d-nhwc, which we will presumably have an implementation for. This would satisfy #45.
If we see the inputs are being flattened and the output is being reshaped, we can rewrite to systolic-array-conv2d-im2col-fc-nchw, which, again, we don't have an implementation for.
- If we rewrite to systolic-array-conv2d-im2col-fc-nchw and we see that the inputs/output are being transposed from/to NHWC/HWIO, we can rewrite to systolic-array-conv2d-im2col-fc-nhwc, which is what this issue is asking for.

This isn't exactly ideal -- namely because Glenside seems to be implying a layout preference, which it is. Glenside has a layout preference for NCHW when expressing convolutions, which arises purely due to how the language/how access patterns are currently structured. In the future, I hope that #6 will eliminate this preference.

gussmith23 commented 3 years ago

I think this is mostly done now, with #82. It seems like it's working in the eval; we can find the new systolic array types in Resnet. This is great; this means we're probably able to start doing some interesting extraction with different sizes/types of arrays.

gussmith23 commented 3 years ago

Closing. We'll see if we end up using this, but it's in the language, at least!

gussmith23 / glenside

Implement conv2d-im2col-fc #79