tensor-compiler / taco

The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
http://tensor-compiler.org
Other
1.22k stars 185 forks source link

Block Sparse Tensors #179

Open DJDDDM opened 5 years ago

DJDDDM commented 5 years ago

Are Block Sparse Tensors already supported or are there plan on supporting this? Comming from Quantum Chemistry we often deal with sparsity limited to a specific range rather then a randomly equal distributed sparsity. Such Algorithms that work on these blocks need to stor less knowledge about the indices and therefore being faster then usual sparse Algorithms.

fredrikbk commented 5 years ago

Hi @DJDDDM. Block sparse tensors should be supported by expressing them as higher-order tensors. So a blocked 3-tensor would become a 6-tensor. If the tensor is sparse with dense blocks it could be created with the format compressed, compressed, compressed, dense, dense, dense or some variant thereof to avoid storing unnecessary indices in the inner dimensions.

DJDDDM commented 5 years ago

Thats possible. However having 6 instead of 3 dimensions would change the access. A(i,j,k) would no longer refer to the same element. therefore the layer of abstraction in which the einsteinnotation is written would change due to changes of the underlying layers. This is not a good design. A(i,j,k) would become something like A(i,a,j,b,k,c) hence the multiplication would become unreadable

fredrikbk commented 5 years ago

I agree it is not good design and results in hard to read expressions. We plan to build a layer on top of this where you can express blocked tensors that can be indexed with the appropriate number of index variables. These expressions would then be rewritten to expressions with more index variables under the hood.

I'm patching in @oicirtap who is working on a redesign of the user-facing APIs. Can you look into blocking? We can discuss it when we meet this week.

DJDDDM commented 5 years ago

by the way. do you need more programmers or are you fine and just need a bit time for the api? I am planning on adapting taco for my needs, but these changes might be useful for others too.

oicirtap commented 5 years ago

We plan to release a draft of the finalized API before next week, so we can receive feedback and comments from taco users. The new API is quite ambitious in what we want it to achieve, so a full implementation will not be available quite yet, but we could patch to the current API some QoL changes such as the above proposed layer of abstraction for blocked tensors, and other methods. I'd be interested in knowing what adaptations you have planned?

Regarding blocked tensors, I am envisioning implementing this as a method on the tensor class which specifies which dimensions are linked to each other. A new IndexVar subclass would potentially be necessary to properly link dimensions together. I am envisioning something like this:

// Blocked formats
Format dbv({Sparse,Dense});
Format dbm({Sparse,Sparse,Dense,Dense});
// Create Blocked tensors
Tensor<double> a({10,4},    dbv);
Tensor<double> B({10,9,4,3}, dbm);
Tensor<double> a({9,3},    dbv);
// Indicate which dimensions to link
a.link(0,1);
B.link(0,2);
B.link(1,3);
c.link(0,1);
IndexVar i, j, k;
a(i) = B(i,j) * c(j);

Another option would be to define a block method on the tensor class which performs the creation of new dimensions under the hood. Something like:

Tensor<double> B({50,50}, format);
B.block(0 /*dimension index*/, 5/*block size*/);
DJDDDM commented 5 years ago

I disagree in the linking. I would go with the block method. But having 2 possiblities. First one like above to set the shape of all Blocks. Tensor A({50,50},blockdense) A.block.all(5,5,dense) To create a Matrix with 5by5 blocks together in a 10by10 supermatrix resulting in a 50 by 50 Tensor (everything two dimensional). Second to shape the Tensor into blocks Tensor A({50,50},blocksparse) A.block.custom({0,0},{5,5},dense) A.block.custom({10,10},{5,5},dense) This would result in a sparse 50by50 Matrix with two dense 5by5 blocks starting at 0 0 and 10 10 of the supermatrix. The second function should also be able to store an already existing tensor as a block inside the matrix. Like A.block.custom({5,5},B) Would put the Tensor B as a Block inside the Tensor A starting at position 5 5.

The first one should give an error if totalsize % blocksize != 0 While the second adapts its blocks automatic.

oicirtap commented 5 years ago

As for the first approach, I agree that it is much more intuitive to perform the block transformation with a block method. Also, blocking all dimensions with your A.block.all notation would also be pretty useful.

As for the second approach, I don't think taco is currently capable of compiling code for this kind of custom dimension in which ranges of a single dimension are part of a block, while others are regular sparse entries. Achieving this behavior could potentially involve major changes to the code generation machinery of taco. Do you have any opinions on this @fredrikbk ?

fredrikbk commented 5 years ago

I haven't spend sufficient time yet thinking about blocking but, of the cuff, I'm thinking some way to specify blocking to the Format constructor. I think it's probably more convenient to express blocking in terms of blocks (in English, "a sparse matrix with 3x3 dense blocks") instead of in terms of dimensions ("this dimension is blocked and so is that"). @DJDDDM do you agree?

These two approaches are equivalent; the former I think matches better the way the user thinks and the latter easier to reason about in the compiler. I remember now that I thought extensively about this when we designed the Simit language. @oicirtap could you study the examples in the Simit paper, with a focus on the matrix block types?

fredrikbk commented 5 years ago

@DJDDDM, yes, our main issue is one of programming manpower. We're redesigning the compiler core, which requires a lot of research that's sucking a lot of our time, and at the same time trying to redesign the API. What are you planing to do to adapt taco to our needs?

DJDDDM commented 5 years ago

Your version should be probably fine @fredrikbk . But then we need to keep in mind that we still wants to say, that the block starts at 3 3 3 of the tensor. so under the hood the sparse array of blocks should possibly work but it would be inconvenient if the api would represent it as array of blocks. But I doubt that this is too difficult given that basically all matrices are stored as arrays with the information of the dimensions.

DJDDDM commented 5 years ago

If I could like I wanted I would adapt Taco in the following ways.

  1. Blocking as above
  2. writing a comparison operator which only checks the data of the tensor and not the name and so on (I know there is the equals function but it doesnt allow for Tensors as input).
  3. writing a print function showing two dimensional Tensors as Matrices instead of the array. similiar to how eigens print function works. For more then two dimensional one could reason about showing multiple matrices, but this quickly gets confusing, so two dimensional should be enough
  4. Never exiting without printing an error. ( If I do A(i,j)=B(i,j) + C(j,i) my code just ends without saying something, but the online code generator gives an error. So it should be sufficient to get all the errors from the online version into the C++ version.
  5. Allow for transpositions (A(i,j)=B(i,j)+C(j,i) should work like A(i,j)=B(i,j)+C.transpose({1,0})(i,j))
  6. Allow for expressions to get stored. (If I have A(i,j) = B(i,j) + C(i,j) and D(i,j) = B(i,j) A(i,j) it should produce the same as D(i,j) = B(i,j) (B(i,j) + C(i,j)). So that you can first assemble all equations inside your code and not having to evaluate them between each line. This would allow functional programming.)
  7. Access Elements by Index. (int a = A(3,3) should assign the element of A at position 3 3 to a).
  8. get the Tensor as a standard C++ array(multidimensional). (array a = A.getArray)
  9. allow for multiple compilations and assembles and so on. (If I have A.evaluate in the one function A.evaluate later on should not produce in an error. It could simply check if A is already evaluated and then return the last result. Maybe something like A.evaluate.save() would be better.)
  10. a function to scan trough all entries and making the entries spare if they are below a certain threshold. (A.clean(10^-6) should make everything between plus and minus 10^-6 sparse. even so this is an N^dimensions operation it is usually faster to do this every iteration since a bunch of multiplications are eliminated. Maybe this can be integrated into the multiplication like A.cleanallways(10^-6). So that one does not need to do it manually maybe allowing for even better optimizations.)

I am not sure which of these I can implement, since my programming skills are probably lower then yours. but this 10 points were the first that came into my mind. If I have something of these done I will contact you if you like.

I am not sure which version I am using, so if one of these points already exist please tell me.

DJDDDM commented 5 years ago

I forgot 11. A Diagonalizer written in Taco

DJDDDM commented 5 years ago
  1. Slicing. (Defining Indexvar i(0,5) should make A(i) a part of the total Tensor A which only goes from 0 to 5 instead over the whole range of A.)
DJDDDM commented 5 years ago
  1. A.iscompiled() and A.isassembled() return false or true.
DJDDDM commented 5 years ago
  1. assemble/compute should check if the tensor still has the format that it had at compile.
oicirtap commented 5 years ago

@DJDDDM Thanks for the suggestions. This is good feedback that allows us to understand better what users are looking for in our tool. You suggestions all have different levels of difficulty in terms of the time it would take to incorporate them into the library, but I'll make sure to discuss and consider all of the above.

We are already working on a number of your suggestions, but I cannot promise any short-term timeline for the changes. We are a small team and changes can take a while.