Closed gussmith23 closed 3 years ago
This is interesting for Glenside because layout comes up in a big way.
It's taken me til now to realize that all of our current systolic array usages have seemed "simple" in my mind because (maybe among other things) they're not layout-dependent. That is, when we're using a systolic array just as matrix multiplier, there's a very simple concept of layout, because there's only two dimensions in each input. All of our systolic array usages just use the systolic array as a matrix multiplier -- "FC" (fully connected) mode as Scott likes to say. I've been mentally blocked on how we'll implement systolic arrays in "conv2d" mode (as opposed to FC mode), knowing that it would be more complicated, but unsure as to why. Now, I realize that it's (at least partly) because the systolic array makes assumptions about layout (Scott's prefer NHWC/HWIO), and so, when we create the rewrites to map to these pieces of hardware, we need to be able to detect layout.
As I think about it actively, though, I'm thinking it shouldn't be a problem, or not as big a problem as I thought. It will be kinda hacky, but not too bad.
The core thing to think about is that the Glenside implementation of conv2d already contains implicit layout information. We are doing very specific things with each dimension, and so we can tell the layout based on the computation. In the future, dimensions and layouts will be more explicit (see #6) and we shouldn't need to "infer" layout from computation pattern, as I'm describing here. But for now, we need to match on the computation that we know represents a conv2d in a specific layout. Specifically, we implement conv2ds on NCHW tensors -- NHWC tensors are first transposed to NCHW. So we can match on the NCHW convolution and turn it into a systolic-array-conv2d-nchw
, which we don't actually have an implementation for. Then, from there, we can make a whole bunch of rewrites that invoke other systolic arrays:
systolic-array-conv2d-nhwc
, which we will presumably have an implementation for. This would satisfy #45.systolic-array-conv2d-im2col-fc-nchw
, which, again, we don't have an implementation for.
systolic-array-conv2d-im2col-fc-nchw
and we see that the inputs/output are being transposed from/to NHWC/HWIO, we can rewrite to systolic-array-conv2d-im2col-fc-nhwc
, which is what this issue is asking for.This isn't exactly ideal -- namely because Glenside seems to be implying a layout preference, which it is. Glenside has a layout preference for NCHW when expressing convolutions, which arises purely due to how the language/how access patterns are currently structured. In the future, I hope that #6 will eliminate this preference.
I think this is mostly done now, with #82. It seems like it's working in the eval; we can find the new systolic array types in Resnet. This is great; this means we're probably able to start doing some interesting extraction with different sizes/types of arrays.
Closing. We'll see if we end up using this, but it's in the language, at least!
Scott can support an im2col'ed conv2d fully-connected systolic array invocation taking NHWC input. This means we can fold a bunch more access patterns into a hardware op, making us generate less code on our side (and hopefully making things faster by relying on his stuff more).
To be clear, what I mean is this: There are roughly three ways that we've discussed implementing convolutions.