Project Proposal: Vectorization

JonathanDLTran commented 2 years ago

What will you do?

I want to add a vector instruction set to the Bril language, and implement a vectorization pass that finds opportunities to vectorize Bril IR code into a parallel form. The vector width would be fixed in advance. In particular, I plan to write a simple vectorizer pass that detects whether operations inside a program can be vectorized together.

Depending on how much the scope of this project is, if the scope is too small, I would try implementing the polyhedral model for loop optimization, to try to achieve further vectorization, within loops. If this ends up being the case, I will read over lecture nodes/slides relating to the polyhedral model. I need to catch up on aspects of linear programming and integer programming. I would aim to reproduce a basic optimization like Martin Griebl documented as the space-time transformation in https://www.infosun.fim.uni-passau.de/cl/loopo/doc/loopo_doc/node3.html. In particular, I would not try to reproduce the full power of using the polyhedral model for optimization, but would limit my scope.

These notes seem promising: https://polyhedral.info/ - from an older version of CS 6120 https://homes.luddy.indiana.edu/achauhan/Teaching/B629/2010-Fall/StudentPresns/PolyhedralModelOverview.pdf https://events.csa.iisc.ac.in/summerschool2013/slides/automatic-parallelization-introduction-polyhedral-models.pdf

How will you do it?

I plan to extend the Bril interpreter to have a vector instruction, with vector operations like vector addition and vector multiplication, concatenation of vectors, as well as loads and stores into and from vectors. This would involve changing the Bril Json Parser and also involving changing the typescript interpreter.

I will be working with the Bril memory extension, to allow for interaction with arrays and pointers in the Bril language. In order to check for vectorization opportunities, I will implement alias analysis for Bril, to see if vectorization can be done. I would start by following the alias analysis defined in lecture, and try expanding to other forms like Steensgard's alias analysis algorithm.

I will try to detect whether a loop can be unrolled, and unroll the loop as much as possible to enable further vectorization inside the loop. If not possible to unroll the whole loop, I will unroll the loop by some number of fixed iterations, say 4 iterations at a time.

If I move on to working on the polytope method, I would also have to add utilities to identify for loops and their format, and also represent the polyhedron and allow for transformations on the polyhedron.

How will you empirically measure success?

I will either create my own benchmarks or choose some common benchmarks, and translate these benchmarks into bril. I will look for benchmarks that focus on loops, with arrays inside the loop.

Since I am working at the Bril level, I am planning to count the number of vector instructions generated in each pass. The more vector instructions emitted, the more successful the pass.

I might also come up with a cost model that evaluates the cost of each vector instruction and regular instruction. This way, the optimized and unoptimized programs can be compared based on the result of the cost model.

Team members: Myself, although I am open to having other team members as well.

sampsyo commented 2 years ago

This all sounds great! Instruction counts will be a good way to measure this in a Bril context. A cost model sounds interesting too, but NBD if this part doesn't pan out.

I recommend reading a couple of blog posts about prior attempts at vector extensions to Bril (not that you need to take any ideas from them): https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/interpreter-vector-support/ https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/vril-vector-bril/

I also think it would be a very good idea to nail down the vector extension you'll using first (and somewhat quickly), before you get too deep into the auto-vectorization work. Do you think you could wrap that up (presumably including an extension to the reference interpreter together with documentation and tests) and open a PR partway through the project period? Then I will be able to give you feedback on that as a kind of intermediate milestone, and you'll also mitigate some risk w/r/t doing the actual optimization part.

JonathanDLTran commented 2 years ago

I will definitely utilize the 2 blog posts that you have given. I'm a bit shaky on the vector extension, so I feel that this will give me a bit of understanding on the elements of the extension that I should incorporate. I'll also open a PR once the extension to the interpreter, along with documentation and tests, are completed

sampsyo commented 2 years ago

Sounds great!

JonathanDLTran commented 2 years ago

I ultimately drew from both blog posts and also from the LLVM vector instructions. I limited the vectors to only be for integers, and limited the length of the vectors to hold exactly 4 integers.

I then implemented a set of instructions including

v = veczero: which creates a vector with all entries zeroed-out and saves to v
vecload v i e, which loads element e into vector v at index i
r = vecstore v i, which stores the ith element of vector v into register r
v = (vecadd | vecsub | vecmul | vecdiv) v1 v2 which respectively do vector operations on v1 and v2, then store into v
v = vecmac v1 v2 v3 which multiples v1 and v2 element-wise, then adds v3 to the product element-wise.
v = vecneg v1, which negates all entries in vector v1. The vector instructions are interpreted at Brili-Vc

I was also thinking about adding vector concatenations, to combine vectors together, but I did not implement this.

I also added some test cases as well just to make sure the vector instructions work as I want, and these are in tests.

I was wondering if this would be a good set of vector instructions, or if I should consider revising the instruction set?

Finally, I started thinking about alias analysis. I've implemented the basic version of alias analysis for the Bril language at Alias-Analysis, though there's some debugging left to do.

I'm now transitioning to thinking about how to unroll loops in the natural loop representation. Currently, my idea was to identify super simple loops that iterate based on an integer index that counts up or down, and is predicated by some condition that depends on that index. Later on, I'll have to find a way to generalize this idea to apply to more loops that do not follow the conditions I detailed.

sampsyo commented 2 years ago

Nice!! This degree of specialization seems like a good way to avoid things getting out-of-control complicated:

I limited the vectors to only be for integers, and limited the length of the vectors to hold exactly 4 integers.

This is not something to address right now, but one slight generalization could be to make the vector length a "compile-time" option instead of always 4. So, like, the interpreter could be put into different modes to support different lengths.

I was wondering if this would be a good set of vector instructions, or if I should consider revising the instruction set?

Seems pretty good to me! You could extend it opportunistically if your optimization ends up needing something you don't have yet.

Starting simple with alias analysis seems good as well. I would see how far you can get with really trivial alias analysis, even if it means making some benchmarks a little contrived to make them work. Doing fancy alias analysis is unlikely to pay off, IMO.

sampsyo / cs6120

Project Proposal: Vectorization #312