Implementing batch norms in hardware

gussmith23 commented 3 years ago

cc @stdavids

I know I'm a bit behind on this, but we're finally ready to start looking at implementing batch norms from the Glenside side of things. We can talk in hackathon about it. From the Glenside perspective, this involves:

[ ] Whatever type of hardware block the batch norm will run on, need to create a representation for it in Glenside
[ ] Implement rewrites to simplify/fuse the computes, if we can and want to
[ ] Generate code for the batch norm operators

The interesting thing about batch norms in our workloads is that they aren't one "batch norm" operator, but instead, they're a chain of simpler primitive operators implementing the linear transformation that batch norm represents at inference time. This is because I've run the SimplifyInference Relay pass over the workloads before importing them from Relay. This is a habitual thing to do when working with workloads in Relay, and so that's why I did it; however, it might be easier for Glenside to take in opaque, un-simplified batch norm operators. It could then implement its own equivalent of SimplifyInference and create a "batch norm inference" node. Currently, this isn't possible, as the batch norm nodes in Relay have been blown up into smaller operators. I'm not sure this will be necessary, though -- I'm hoping we can just enable Glenside to fold the computation back together into some efficient vectorized format.

gussmith23 commented 3 years ago

Joseph has a kernel for batch norms. Take a look at that. That's a good first step. We may want to collapse batch norm back to its non-inference version. I can also write a glenside rewrite to detect the batch norm for inference pattern.

gussmith23 commented 3 years ago

should just replace batch norms with a "batch norm" call

gussmith23 commented 3 years ago

Here's my plan:

[ ] Add BatchNormInference node
[ ] Parse batch norms from Relay into BatchNormInference (with much warning that this is a hack)
[ ] Compile BatchNormInference into a call to a software kernel.

gussmith23 / glenside

Implementing batch norms in hardware #46