hameerabbasi / xsparse

XSparse is an experimental XTensor-like C++ sparse-array library.

BSD 3-Clause "New" or "Revised" License

9 stars 6 forks source link

[GSoC 2023] Implementation of the merge lattice data structure and algorithms #17

Open adam2392 opened 1 year ago

adam2392 commented 1 year ago

I wanted to use this GH issue to i) track the implementation of the merge lattice as well as ii) provide a possibility of implementing a preliminary PR before the GSoC due date. This will be iterated upon in conjunction with the GSoC proposal Google Doc.

Summary

I'm starting an issue to start preliminary work on the merge lattice data structure implementation. Following Section 4-6 (mainly Section 5) in https://dl.acm.org/doi/pdf/10.1145/3133901, there seem to be a few things that need to be implemented:

[ ] Lattice points, which are objects associated with an arbitrary number of tensor dimensions, $Tp = {t{p1}, ..., t{p_k}}$ and an expression, $expr_p$.
[ ] Lattice is a set of lattice points and a "meet operator"
[ ] Conjunctive merge algorithm of lattice points that takes in two lattice points and produces a new lattice point that is the conjunction of the two input lattice points
[ ] Conjunctive merge algorithm of two lattices that takes in the two lattices and produces a new lattice that applies the conjunctive merge for all lattice points
[ ] Disjunctive merge that takes in two lattices and produces a new lattice that contains the disjunctive merge of the two lattices.

Some open questions that I am still sorting out

A few open questions I have will be documented here and migrated into the GH issue description when I understand them better, or an answer is given.

Lattice points are represented by a set of p tensor dimensions. Are the dimensions integer values? I don't see why not, but maybe that's incorrect.
Lattice points are represented by an expression. How should we represent that expression?

Downstream tasks

Optimizations: Once the merge lattice is implemented, tested and merged, then it can also be optimized using the tips in Section 5.2. As the work is done to implement the merge lattice, we can keep these in mind.

Code-generation: The code-generation algorithm in Figure 11a looks pretty straightforward once the merge lattice is complete, but I presume complexities may arise as my understanding improves.

Misc.

My understanding and reading of the codebase after our two calls so far have been that the level format representation, properties and functionalities have all been more or less implemented, so the next step is finally the code-generation algorithm.

Please let me know if there's any issues you see with this plan? cc: @hameerabbasi and @bharath2438

Relevant papers:

tensor algebra compilation (taco): http://tensor-compiler.org/files/kjolstad-oopsla17-tensor-compiler.pdf
format abstraction for sparse tensors: https://arxiv.org/pdf/1804.10112.pdf
tensor compilation with workspaces: https://dl-acm-org.ezproxy.cul.columbia.edu/doi/pdf/10.5555/3314872.3314894

hameerabbasi commented 1 year ago

I think @bharath2438 can answer this better but we use a consteval function that takes and produces booleans as an expression and the actual dimensions (Dense, Compressed, Hashed, ...) as tensor dimensions.

I see no issues with your plan so far.

adam2392 commented 1 year ago

I think @bharath2438 can answer this better but we use a consteval function that takes and produces booleans as an expression and the actual dimensions (Dense, Compressed, Hashed, ...) as tensor dimensions.

To clarify, this is regarding the Lattice points are represented by a set of p tensor dimensions. Are the dimensions integer values? I don't see why not, but maybe that's incorrect. question right? Specifically, how should we represent the actual dimensions.

I don't see a consteval anywhere in the codebase.

Is the proposal that in the template of the Latticepoint, there will be these consteval functions? E.g. something like:

template <bool Dense, bool Compressed, ...>
class Dimension
{
public:
    consteval bool is_Dense = Dense;
    ...
};

// then in LatticePoint
template <class... Dimension, ...>
class Latticepoint;

bharath2438 commented 1 year ago

Hi, let me try answering the questions, sorry for the late reply.

Lattice points are represented by a set of p tensor dimensions. Are the dimensions integer values? I don't see why not, but maybe that's incorrect.

Lattice points are obviously going to be some kind of numerical datatype. We wouldn't like to restrict them to an integer though, we'll be using a type alias.

Lattice points are represented by an expression. How should we represent that expression?

The expression is represented by a constexpr/consteval function (currently all tests use constexpr functions that represent these expressions, I remember me and @hameerabbasi decided it wasn't a good idea to use consteval, but I do not remember why). You can see here that we are taking this function so that we have a starting point for calculating the merges. https://github.com/hameerabbasi/xsparse/blob/7d856f72e022640a1771f80ae8fdfdf7adf24eb7/include/xsparse/level_capabilities/co_iteration.hpp#L19 For now, this assumes that some higher level code generation algorithm will provide us the function itself, along with the levels.

bharath2438 commented 1 year ago

I'm starting an issue to start preliminary work on the merge lattice data structure implementation. Following Section 4-6 (mainly Section 5) in https://dl.acm.org/doi/pdf/10.1145/3133901, there seem to be a few things that need to be implemented:

I think these are already taken care of as part of coiteration. Once the coiteration code gets the constexpr function specifying the type of merge, it should be able to merge the levels.

adam2392 commented 1 year ago

Notes from 03/19/23:

coiteration rn assumes levels are order, unique and merging based on that, but what we actually need is the following: merge levels that are not ordered, and not unique. E.g. not-ordered: hash, and not unique: where we want to apply de-duplication. So we want to modify the co-iteration algorithm that accounts for unordered and non-unique levels. TODO: read a bit more on what it means to be "non-unique". I am not 100% clear yet on this point.
lattice data structures need to be discussed with Hameer
An extra higher-level API is needed to merge "vectors of level formats". E.g. CSR = dense on top of compressed format. Two CSR need to merge the two dense and the two compressed. Need a coiteration higher-level code to merge in vector of level formats (rn the co-iteration algorithm only implements merging of a pair of level formats).
Note: disjunctive/conjunctive merge would be handled already by passing in the constexpr compareHelper function inside co_iteration.hpp

Would a good prelim PR (before GSoC) be to implement?:

the higher-level function to merge vectors of level formats. This seems straightforward if the existing co-iteration algorithm already exists and/or
begin work on co-iteration algorithm that allows merging of non-unique and non-ordered pairs of level formats.

cc: @bharath2438

adam2392 commented 1 year ago

An extra higher-level API is needed to merge "vectors of level formats". E.g. CSR = dense on top of compressed format. Two CSR need to merge the two dense and the two compressed. Need a coiteration higher-level code to merge in vector of level formats (rn the co-iteration algorithm only implements merging of a pair of level formats).

Note: disjunctive/conjunctive merge would be handled already by passing in the constexpr compareHelper function inside co_iteration.hpp

the higher-level function to merge vectors of level formats. This seems straightforward if the existing co-iteration algorithm already exists and/or

@bharath2438 I'm actually a bit confused now.

Where is the merge taking place? The constexpr compareHelper is just returning whether or not the passed in levels are "the same or not". Based on our discussion, was the goal to add actually a compareVector function that compares vectors of levels to determine if the entire vector is the same or not? The merging is just some future functionality that is done given the results of the compareVector and compareHelper?

E.g. would the draft PR in #19 be the right direction?

bharath2438 commented 1 year ago

Where is the merge taking place? The constexpr compareHelper is just returning whether or not the passed in levels are "the same or not". Based on our discussion, was the goal to add actually a compareVector function that compares vectors of levels to determine if the entire vector is the same or not?

The constexpr compare function takes in boolean inputs and gives out a boolean using the expression. Now, these booleans are got by comparing "iterators" over the levels to their "endpoints". An iterator over a level is an iterator over the (ik, pk)s of the level where ik is the coordinate and pk is the position of that coordinate. Basically, the booleans store the value of - "Has this particular iterator reached it's end?" and then, they are passed onto the comparison function which merges them. These are all helper functions that actually help output and iterator over the merged dimensions of all these levels.

Example: Let's say we have 3 levels to be merged - a, b and c. At any point during the iteration, let's say a hasn't reached it's end while b and c have reached their respective ends. So, in this case, the input to the constexpr function would be (0, 1, 1). Let's say the constexpr combines the input like return arg1 || arg2 || arg3. So the output would be true. The real meaning of the function is something like - "Check if either of the three levels have reached their end". So, here b and c have reached their end, so the co-iterator also has reached it's end.

Bottomline is - coiteration.hpp outputs an iterator that is able to iterate over the merged dimensions of the input levels. What we need to do is implement a higher level algorithm that is able to give coiteration the constexpr function as well as handle things at the "tensor storage format" level (COO, CSR etc. that have stacked levels).

hameerabbasi commented 1 year ago

Let's say the constexpr combines the input like return arg1 || arg2 || arg3. So the output would be true. The real meaning of the function is something like - "Check if either of the three levels have reached their end". So, here b and c have reached their end, so the co-iterator also has reached it's end.

I think the actual meaning is to flip OR to AND. Consider a disjunctive merge (operator+ or similar, equivalent to OR). Then, we get the constexpr function f(arg1, arg2) := arg1 || arg2. When advancing the iterator, we advance till we find anything in either input iterator, i.e. until f is true, and we end only when both iterators have ended, replacing OR with AND. One way is to repurpose the same function: to use !f(!arg1, !arg2). This flips AND to OR, this is known as DeMorgan's Law.

The same works for a conjunctive merge (operator*), but in this case we have to advance the all minimum iterators till f is true when advancing. The end of the iterator works the same way.

adam2392 commented 1 year ago

Notes from 3/26/23:

Summary direction for Adam's GSoC proposal

The goal of the GSoC 2023 for me at least will be the completion or progress towards merge lattice and its related data structures and algorithms. This is one of the final steps needed in order to implement the code-generation algorithm. An intuition of merge lattices in terms of numpy API is the notion of "broadcasting" e.g. adding two arrays, some of which have axes inserted.

TODO:

Review Einstein summation notation and then re-review the paper
Understand how sorting by multiple keys works, which is useful for disjunctive merges: np.sort, np.argsort and np.lexsort works. (I am still unsure what this train of thought was used for, but I will review)
Review code again based on new understanding to sketch out what I think merge lattice implementation might look like.
Begin a preliminary PR to add the API for level formats to get their parent levels
Implement conjunctive merge logic when input to co-iteration is a hash map. See below for details

Notes augmenting the previous conversion

begin work on co-iteration algorithm that allows merging of non-unique and non-ordered pairs of level formats.

Some more notes on this. The situation is we are trying to take two or more levels in general and try to do a conjunctive, or disjunctive a mixture of merge operations with them. The current codebase assumes the levels have indices already sorted and are unique. Here is a description of the high level conjunctive/disjunctive merge algorithm that leverages the coiteration algorithm.

Examples of unsorted levels: hash map Examples of non-unique levels: singleton

A conjunctive (and) merge algorithm proceeds by: advancing the iterator at the same time for each level to find common elements of each vector (i.e. common indices can be multiplied together; if not common, then they would've been multiplied by "0", so it would not show up in the output anyways). I.e. the output format would only get appended to if there are any common indices. At each vector, we would increment the index of the level that is the minimum. If both levels' indices are minimum (i.e. same value), increment both.

A disjunctive (or) merge algorithm would proceed similarly, but append to the output at every index from each iterator that we are merging.

Result: This results in the indices of all level formats that we want to use to construct the output.

A note about determining when the iterator is at its end:

conjunctive (and) = end when one iterator is finished and reached the end
disjunctive (or) = end when all iterators are finished and reached the end

Augmenting coiteration to deal with unordered levels (i.e. hash maps)

The API that calls coiteration should and will eventually perform the following logic: If it is sortable, sort the level. Otw if it is not sortable, then it must be has_locate == True property. If the unordered level is part of a disjunctive merge, then the current thought is that we will convert it to a compressed level that can be sortable.

If the unordered level is part of a conjunctive merge, we can iterate through the level that is ordered and use locate to check if the index exists in the hash level. If it returns NULL, then we don't append to the output. If it does not return NULL, then we append the output with this index.

Currently, we want logic that implements this depending on if one of the level formats passed to Coiterate return has_locate == True and is not ordered.

adam2392 commented 1 year ago

Summary of GSOC 2023

22 , #24 , #25 , #26 , #29 , #30 and #31 addressed the main problems described here.

Merge Lattice work

The relevant work left to do now is to work on iterating work on a MergeLattice class that can co-iterate over collections of Tensors. The added complexity comes from the fact that:

there is no more one-to-one matching between levels that are co-iterated. I.e. a Tensor can be 3D and then another Tensor can be 2D and one would need to figure out within the MergeLattice which dimensions of each tensor to co-iterate over depending on the merge function and the indices passed in
Indices are passed in the Einstein notation format to define how the tensors are to be co-iterated
One needs to keep track of min_ik and pkm1 from other levels when co-iterating over dimensions of each Tensor.

The last point seems like it will make co-iterating quite complex because to advance the iterator of MergeLattice, one would need to possibly recurse up the levels of a Tensor and then recurse back down the levels of the Tensor when one hits the end of an inner level. For example, say Tensor A is comprised of levels (a, b, c) stacked. If we iterate over Tensor A, then we start with the beginning of a, with the corresponding min_ik and pkm1 passed to level b and then so on to level c. We then iterate over level c, until it's reached its end. Once c.iterator == c.end(), then we need to increment b and re-initialize the coiter_helper for c and then advance c again. However, this process can go arbitrarily up the stack of levels if say b.iterator == b.end(). This recursion needs to go "all the way, and then all the way down" to handle the advancing of the Tensor's iterator in MergeLattice. It gets even more complex when you have Tensors of different dimensions.

It is because of this fact that @bharath2438 and I discussed perhaps implementing a very simple MergeLattice first that assumes that the input Tensors are exactly the same dimensions and we are only dealing with a conjunctive merge. E.g. two CSR arrays being multiplied together.

This should be a simple generalization of the unit-test added in #31. Then we can generalize the MergeLattice implementation to handle additional layers of complexity.

Misc.

cc: @hameerabbasi and @bharath2438 if you think I missed anything?

It would be cool to try implementing the MergeLattice since the latest PR #31 shows us how to run the API. Then with a basic implementation of workspaces, couldn't this be relatively usable for basic sparse tensor operations?