gussmith23 / glenside

A pure, low-level tensor program representation enabling tensor program optimization via program rewriting. See the web demo at https://gussmith23.github.io/glenside-web-demo/
68 stars 10 forks source link

Bring some of 3la-pldi-push-main to main #160

Closed gussmith23 closed 2 years ago

gussmith23 commented 2 years ago

This PR brings changes from our PLDI 2022 push into main. It does so by starting from our 3la-pldi-push-main branch and reverting/undoing/modifying a bunch of the changes. This allows me to preserve the commit history (giving proper contribution attribution to Vishal and Mike!) while not merging everything from the PLDI push.

For each of the following files, we need to inspect the diff between 3la-pldi-push-main and main, and revert any changes that we don't want. Additionally, we need to add in any necessary tests.

I might enlist some help on this cleanup in winter 2022. If so, I'll provide some more detailed instructions.

Basic workflow:

  1. checkout bring-some-of-3la-pldi-push-main-to-main
  2. choose a file from the list below and run git diff bring-some-of-3la-pldi-push-main-to-main..main
  3. inspect the changes and determine which are appropriate to bring to main
  4. run git checkout --patch main -- <filename>. This will allow you to revert changes hunk-by-hunk. For the changes which shouldn't be merged, hit y to tell git to create a revert change for the change. For the changes which should be merged, hit n -- no revert change will be created.
  5. once you finish the checkout, commit all of the reverts for that file. They should already be staged.

Each major change being brought to main should be tested.

This will likely also result in changes to TVM, which will hopefully get merged into mainline. We have a similar 3la-pldi-push-main branch on our TVM repo, for which we'll need to do this same process.

List of files:

specific todos:

gussmith23 commented 2 years ago

I think the addition of name_to_dtype should be kept. We can perhaps reimplement it if needed, but it's probably right to have it, esp. for compiling back to Relay.

gussmith23 commented 2 years ago

I think Mike's changes to ILP are all good, but we should make sure all the tests pass.

gussmith23 commented 2 years ago

We no longer manually union things using Mike's approach from the PLDI push. The relay->glenside rewrites should handle everything correctly now.

gussmith23 commented 2 years ago

I think we should remove relay_shape. The reason it's there is so that we can let access pattern shapes take precedence in merge(), as far as I can tell. That is, if an eclass only has Relay operators, then its access patterns will look fairly arbitrary. (Sometimes i put all the dimensions in shape, sometimes I put them all in item_shape...)

Two cleaner solutions:

gussmith23 commented 2 years ago

I'm rewriting the Analysis for Glenside to accomodate for Relay. Previously, Mike did this by including a "relay shape" field, with some additional logic in merge() to handle the cases where a node might have a relay shape and no access pattern shape or vice versa.

I think we can simplify this to just have a flag which keeps track of whether an eclass contains only Relay nodes. Then, if an eclass contains only Relay nodes, we interpret its access pattern shape as a tensor shape (i.e. if its access pattern shape is ((a,b),(c,d)), we interpret it instead as just (a,b,c,d)). We expect that the underlying tensor shapes must match, but that we may have to take the access pattern shape from the eclass with "real" glenside nodes, as that should have the "correct" access pattern data.

All of this is a consequence of not having a disciplined way of defining relay shapes in terms of access pattern shapes. I.e. we could always put them in the shape field, or always put them in the item shape field, but I do a mix of both.

gussmith23 commented 2 years ago

One of the main problems with this currently is the fact that something like the following can happen:

Eclass A starts with a relay node. It points to eclass B, which starts with a Relay node. Eclass A gets rewritten and now includes a Glenside node. Its access pattern data is recalculated and now considered "correct". Eclass B gets rewritten and now includes a Glenside node. Its access pattern data is also recalculated. Assume the access pattern shape changes, and this new version is now considered "correct". This rewriting triggers the recalculation of analysis data for eclass A (see process_unions in egg). Because the eclass B's shape changes, eclass A's shape changes. We try to merge that new analysis data with the old analysis data for eclass A, and it fails, because both are "correct" but they differ.

The thing we're missing here is that, just because a node is a Glenside node, it doesn't mean its access pattern data is correct/unchanging/"settled". Instead, we have to know that all of its children have settled their shapes as well. How do we know if a node's shape is settled? I think it's just a matter of whether

  1. the node is a Glenside node and not a Relay node (which is what we currently track)
  2. all of the node's children are settled (which we currently don't track).

So I think it's just a matter of adding that second piece of information. When we build an analysis for a new node,

gussmith23 commented 2 years ago

Okay, I've resolved the above. But still have some residual errors w/ access patterns not calculating correctly.

gussmith23 commented 2 years ago

Current issue: using access-shape to make a shape literal doesn't work when that shape literal might need to change b/c the underlying shape isn't yet finalized. The solution is to use get-access-shape to point to the shape of an existing eclass...except when that's not actually possible! I.e. when the shape isn't the shape of another eclass.

All of this is making me think I should just rewrite the damn eclass analysis once and for all! We would just put all elements in a vec and have an optional usize for the access axis.