Open oicirtap opened 5 years ago
This looks really good.
However I have a few minor things:
RangedIndex in cases like A(i,j) = B(i,k(0,10)) C(j,k(0,10)) I would prefer to be able to RangedIndex k(0,1) A(i,j) = B(i,k) C(j,k) to do the same. In longer equations it would give a problem having to write the same range multiple time.
Inserting sparsity or removing values With better inserts can we also have a method to undo inserting? like A(3,3) = sparse or A(3,3).remove()
Given the currently plant API changes it should be fairly easy to do
Will the more advanced features like functional programming and blocked tensors come in the next theoretically enhanced version? (I have seen a paper from your group about different tensor formats)
Will better error reports already come with the new API? (Or I guess most of the errors that exists now will not exist any longer)
How is the status with transpositions? From the outside these seem to be just an API problem.
I guess, once the new API exists functionality like eigenvalue decomposition and sparsing out values will be written by users. At that stage I will probably be able to help. Then we would have a decoupling of API, Theory and Functionality and people could easily contribute to Functionality.
We will work on better error reports for the new version for sure. We’ve been working hard to clean up the theory and think we’re getting there.
The new version will have a hand-written transpose method, like you suggested in an earlier issue. We are however working on a theory for generating transpose code, even together with other expressions, but that will come later (and only of course if it works out).
We’re hoping to get the blocking into the next version (early summer) and I have good hope we can do it. I don’t think it requires changes to the machinery.
Finally, we’ll have improved support for formats probably in the next version with a third level format called Singleton so we can generate coordinates at least. More formats will follow.
Regarding delayed computation: Modifying a dependency of another tensor might require sophisticated versioning semantics. Is In-place modification really necessary? If you want to avoid deep copies of tensors on localized updates, Copy-on-Write versioning could avoid storage overhead, and avoid semantic mutability
To further clarify what I mean here: COW versioning on a sparse storage is just layering sparse storages on top of each other. An entry can exist on any of the storages of the storage stack, but for any given index, only the topmost value is considered current, conceptually like an order-aware version of UnionFS.
And there is always the possibility of squashing many elements within a stack into a single storage, if no one is keeping a reference to the corresponding versions (every version would need a refcount)
This issue presents a draft for the new API for Tensor objects in taco. This new API targets to address a number of issues with the current API, as well as integrate new features for increased functionality, simplicity and code portability. Think of this thread as a means for providing user feedback for the proposed new design. See the bottom of this post for links to other related github issues.
For the working API signature, refer to https://github.com/oicirtap/taco/blob/auto_tensor/include/taco/tensor_api.h
The Tensor API Overview
The new Tensor API targets to address the following major goals:
compile
andassemble
. Another difference comes from missing features in taco such as Random access and insertion of elements into Tensors. Having more traditional API would allow users to easily adopt the library, as well as increase code portability from other applications.Here we can see a comparison between the current and new APIs. Current API:
New API:
Support Classes and Structs
Coordinate
A small structure to represent a multidimensional coordinate tuple. Its purpose is to provide users with a default way in which to group coordinate components.
Component
A small structure to represent non-zero elements of a tensor. Users can populate tensors out of iterators of components (such as component lists). It is also used to represent non-zeros upon iteration over tensor objects.
The Tensor Class
A major change in the API is the removal of the
TensorBase
in favor for a unified use of theTensor
templated class as the standard for user representation of tensors. Tensor will be templated based on the tensor component type (type of the values stored within the tensor), and order (number of dimensions of the tensor).Tensor Random Access
The new API will now support tensor value random access. In addition to the
insert()
methods, users will be able to callget()
to read specific values from the tensor. In addition, theoperator()()
method will be overloaded to allow shorthand notation for insertion and access to tensor values. This can be seen in the example below.Tensor Value Population
In the current API, tensors can be populated by inserting a set of values to a
Tensor
object with theinsert()
method, and then calling thepack()
method, which consolidates all the inserted values into a single storage structure specified by the tensor format. In order to insert new values to the tensor after packing, the whole set of values needs to be inserted again (pack()
does not take into account existing tensor values). The reason for this is that random insertion into an existing sparse data structure can be really expensive, and it is often faster to build the whole structure from the ground up.This behavior however is not intuitive. The new API will abstract away the tensor reconstruction nuances by allowing users to insert and remove elements, and operate the tensor at any point, while preserving the preexisting tensor values.
Additionally, a new
setFromComponents()
method will allow users to first build a collection of Components, and then insert all of them into an empty tensor in a single operation.Tensor Slices
A new feature that will be supported by the new API is the capability of operating on tensor slices. This includes inserting value slices to a tensor, extracting specific slices, and defining tensor operations that read only sections of a tensor.
Range IndexVars
In order to support tensor slices, a new type of
IndexVar
will be added. This newRangeIndexVar
will keep track of the index range that will be iterated upon.RangeIndexVar
objects will be derived from regularIndexVar
objects via theoperator()
, and will reference the same index (example below).Range IndexExpr
Tensor slice operations are created via range index expressions. In order to create valid expressions, the range of any given index variable must have the same size throughout the expression. Here are some examples of how this can be used:
Hiding Compiler Methods
In order to make taco syntax more intuitive as described above, the explicit use of the following set of methods will become optional:
We call these Compiler Methods.
In the current API, the use of these methods is necessary to populate tensors and perform tensor computations. On the other hand, they can lead to very unintuitive and confusing code syntax and semantics, yet the way in which these methods must be used is very predefined. Therefore, the new API will abstract away the necessary calls to these methods.
These methods will still be available so interested users have the option to control the flow and timing of tensor computations if desired.
Delayed Computation Execution
Since compiler methods are still necessary for the correct functionality of the tool, they must be called at some point. In this section we discuss how hiding compiler methods is achieved.
When tensor operation (such as
a(i) = B(i,j)*c(j)
) is declared, the computation of the expression doesn't occur right away. Instead users have the option of adding scheduling commands to the operation to optimize it (such as adding workspaces to the operation). Because of this, the API cannot compile and compute te expression right away. Instead, the new API will implement a new delayed computation pattern in which tensor operations will happen only when needed.In this pattern there are two events that could trigger a computation:
a(i) = B(i,j)*c(j)
it is clear thata
depends on the values stored atB
. But what if the values inB
are modified beforea
is computed? In this scenario, whena
is computed, the result of the computation is likely not what the user intended. Because of this, when any tensor is modified, any expressions that depend on said tensor are computed before the modification occurs.Additionally, tensors will keep track of their compilation state, so repeated calls to compiler methods o not produce errors (i.e. Calling
compute
multiple times would only compute once and return the previously computed result in subsequent calls).With these two principles, users can write tensor programs without having to worry, or even think about the calls to compiler methods happening behind the scenes.
Related issues
24 #50 #85 #90 #95 #145 #167 #168 #179 #180 #181