saulshanabrook / moa

3 stars 0 forks source link

Variable Length Arrays #1

Open saulshanabrook opened 6 years ago

saulshanabrook commented 6 years ago

How can we represent variable length arrays? Another sort of related question is how to arrays of mixed data types, ala structs/tuples (not unions of types)?

These could come in a couple of forms:

Basically, we wanna be define shapes and indices for these different forms.

LLVM's getelementptr (doc) allows something sort of like thing. It's much simpler, it only indexes one item as a time. There is no notion of slicing/taking a range of indices. But I think it serves as a nice example of how seperating out the indexing/pointer arithmetic from the loading of memory can make an implementation much cleaner. Once I started to grok it, I find it much more consistent than C's array/struct indexing. It might not be quite as fluent, but there are less rules to understand about what indexing actually does, less magic.

Anyway, Lenore Mullin sent me some of her thoughts on heterogeneous arrays. Her work doesn't deal with the data types, but does provide a way to extend shapes so that they can represent elements of different shapes (not just scalars).

I will try to re-explain it here, to make sure I understand it properly.

Let's say we have an array like this, where we understand the indexing to work like it does in Python:

A = [
 [2, 2],
 [2, 2], 
]

We can understand the "shape" of this array, by the constraints posed as we index it. So we could do A[1][0], but not A[3][2]. So we have a shape of <2 2> (MoA syntax representation of a vector) aka [2, 2] (Python representation of a vector).

But what if the inner elements has arrays too?

[
 [[3, 4, 5], 2],
 [[10, 11], [[10, 2], [3, 4]]], 
]

Now we can say this has shape <<2 2> <3> <> <2> <2 2>> using Mullin's extension. The first item, <2 2> describes the shape of the outer array. Then, each next array describes the shape of each sub-array. Then, Mullin says:

screen shot 2018-06-10 at 9 52 55 am

Which I don't follow. But anyway, I think what we have so far works pretty well. For example, let's say we have a vector of 3 variable lengths lists (let's say lengths 2, 4, 6), all of 10x10 matrices. What is the shape of this?

<<3> <2 10 10> <4 10 10> <6 10 10>>

This looks OK, but I wonder about how we have to repeat 10 10 a bunch of times in the shape.

pearu commented 6 years ago

Re \rho\psi expression: this is correct. I'll try to explain this... The example is an array of arrays of arrays. So, there are three levels. At the first level, forget the inner levels, and you will have an array of 3x2, 2x1'', 1x1 arrays. Here I have introduced a new notation: the number of primes denotes the number of fine structure levels. The elements of these three subarrays can be arranged as 3x3 matrix. Hence, one has the column of shapes:

3 3
3 2
2 1''
1 1

Next, note that the 3x2 and 1x1 arrays don't have the fine structure, so nothing is put the right of the corresponding shape rows. 2x1'' array has the following fine structure: its items are 2x1 and 2x1' arrays. So, the shape column of 2x1'' is:

2 2
2 1
2 1'

And last, 2x1' array has the following fine structure: its items are 2x2 matrices and the corresponding shape column is

2 2

When you arrange the shape columns into a matrix such that the shapes of fine structure are put to the right of overall shape, you'll get the \rho\psi representation.

pearu commented 6 years ago

I see the following problem, instread of 3x2, 2x1'', 1x1 arrays, consider 3x2, 2x1'', 1x2, that is, only the shape of the last array is changed. What would be the corresponding \rho\psi representation? For instance, what would be the outer shape? The 3x2, 2x1'', 1x1 example works because it has exactly 3x3=9 elements and one can arrange these as a 3x3 square. The 3x2, 2x1'', 1x2 example has 10 elements and this can be arranged as 2x5 or 5x2 array, and why not 1x10 or 10x1. So, there exists multiple \rho\psi representations to the same array of arrays. The question now is which one is minimal (in the sense of computational complexity, for instance).

saulshanabrook commented 6 years ago

Pearu, thanks for explaining this. Lemme see if I understand. Based on my understanding of how this works, the structure you are describing has this shape:

<<3> <3 2> <<2 1> <2 1> <2 1>> <1 1>>

I describe it with just 1d arrays, where the elements could be scalars or nested 1d arrays. You/Lenore put it into one large matrix with all the values. They are pretty similar, but why does yours first example have a 3 3 in the first row? I understand the first 3 corresponds to how many sub-arrays there are, but what is the second 3? Isn't the outer shape a 1D array? Not a 3x3 matrix? For example, I can see this shape being (leaving out the innermost values for succinctness):

[
  (3 x 2 matrix),
  [
    [(2 x 1 matrix)],
    [(2 x 1 matrix)],
  ],
  (1 x 1 matrix)
]
pearu commented 6 years ago

Saul, my guess for the second 3 is that in certain applications the data is 2D map (3x3 like matrix) that has the different resolution for different elements. Perhaps Lenore had such applications in mind when cooking up this example. We have a more general situation. I haven't followed the Lenore formalism in detail to answer the question yet. I had similar questions when considering other examples...

womenflyplanes commented 6 years ago

Please know this is a very initial set of my thoughts to keep shapes available. I want to be sure the psi function is extended to keep its theoretical prowess noting we must always keep the identity ( iota ( rho xi ) ) psi xi equiv xi. . That is, get shape, get all indices then index an array of any dimension , you ahoy get the array. This should be true for dim =0 ie scalars or infinite arrays , ie when n is countably infinite.

womenflyplanes commented 6 years ago

You all bring up good questions that we need to discuss and represent theoretically. What you see is my first thoughts and extensions for the psi function. I suggest we keep discussing until we can extend the psi function so that we maintain theoretical mathematical prowess. We also need to discuss how all our extensions apply to DNF and ONF. The Psi function is first. Then we need to psi reduction to DNF and ONF to dimension lifted ONF...

womenflyplanes commented 6 years ago

Just like MoA will be an enhanced subset of everything in Python, I suspect, the generalization to heterogeneous arrays will be such a subset also. It is very important to me that everything can be analyzed from shapes and the psi function can proceed as it does with homogeneous arrays.

Ravenwater commented 6 years ago

I haven't been able to penetrate the MoA as a mechanism to solve computational tensor decompositions. However, I have been working with Halide and the Polyhedral compilation framework. FAIR has combined the two for tensor decompositions:

https://research.fb.com/announcing-tensor-comprehensions/

Halide is a tensor decomposition execution scheduling approach along the lines of ATLAS for blocking, and OSKI for DMM linear algebra. Their approach to overload C++ to directly generate the IR data structures I think is incredibly clever and productive as you don't need to create a DSL and they completely side step the problem of sw integration: the whole solution is already in C++ so you can deploy it as just a sw component.

So my question to this new brain trust is, how does the Halide approach compare to the MoA approach? Halide is already in production at Google and Facebook, so bootstrapping on this sw would save years of lead time.

womenflyplanes commented 6 years ago

We are taking the approach of a unified formalism from start to finish. We are building upon a research a compiler and plan to automate the entire process. Stay tuned. I doubt we will be production ready for a while.

On Tue, Jun 12, 2018 at 6:30 PM, Theodore Omtzigt notifications@github.com wrote:

I haven't been able to penetrate the MoA as a mechanism to solve computational tensor decompositions. However, I have been working with Halide and the Polyhedral compilation framework. FAIR has combined the two for tensor decompositions:

https://research.fb.com/announcing-tensor-comprehensions/

Halide is a tensor decomposition execution scheduling approach along the lines of ATLAS for blocking, and OSKI for DMM linear algebra. Their approach to overload C++ to directly generate the IR data structures I think is incredibly clever and productive as you don't need to create a DSL and they completely side step the problem of sw integration: the whole solution is already in C++ so you can deploy it as just a sw component.

So my question to this new brain trust is, how does the Halide approach compare to the MoA approach? Halide is already in production at Google and Facebook, so bootstrapping on this sw would save years of lead time.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/saulshanabrook/moa/issues/1#issuecomment-396755300, or mute the thread https://github.com/notifications/unsubscribe-auth/AGsmlJkYZk_z5Z58pjucvX6Xtwia2EBwks5t8EEagaJpZM4Ug9lF .

-- "Great spirits have always encountered violent opposition from mediocre minds" - Albert Einstein

saulshanabrook commented 6 years ago

Their approach to overload C++ to directly generate the IR data structures I think is incredibly clever and productive as you don't need to create a DSL and they completely side step the problem of sw integration: the whole solution is already in C++ so you can deploy it as just a sw component.

One thing that's a bit different from where we are coming from is that we are looking at Python array APIs (like numpy, xtensor, dask, sparse arrays, or even tvm, torch etc) and trying to see if we can have an abstraction layer above that, so that library authors, of like scipy code, can write to target a higher level python API, that then can target different backends, like llvm generation, similar to the numba machinery. So a C++ API doesn't get us that far/would have to be exposed anyway at the python level somehow.

saulshanabrook commented 6 years ago

I would love to hear how the people who wrote tensor comprehensions/are working on polyhedral compilation think about MoA. I don't see much of their work compared to it, so it's hard for me to have a very strong understanding of the differences. I see MoA as a formalization of doing math on indexing functions/arrays, which can support "lazy evaluation" or really doing algebraic operations on the expressions you are building up, to reduce them or convert them into other forms.

womenflyplanes commented 6 years ago

I would love to hear about that also.

On Tue, Jun 12, 2018 at 6:57 PM, Saul Shanabrook notifications@github.com wrote:

I would love to hear how the people who wrote tensor comprehensions/are working on polyhedral compilation think about MoA. I don't see much of their work compared to it, so it's hard for me to have a very strong understanding of the differences. I see MoA as a formalization of doing math on indexing functions/arrays, which can support "lazy evaluation" or really doing algebraic operations on the expressions you are building up, to reduce them or convert them into other forms.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/saulshanabrook/moa/issues/1#issuecomment-396760521, or mute the thread https://github.com/notifications/unsubscribe-auth/AGsmlFyAdPcicem8Vp-KmmeMfl_BJRPgks5t8EdTgaJpZM4Ug9lF .

-- "Great spirits have always encountered violent opposition from mediocre minds" - Albert Einstein

saulshanabrook commented 6 years ago

@womenflyplanes and @costrouc and I will be having a call on thursday at 10am EST, if anyone wants to join https://meet.google.com/zgr-ovyn-gyr

Ravenwater commented 6 years ago

To add to the confusion, we are working on a hardware interface for a tensor processor where the front-end would generate the proper schedule to execute a tensor decomposition. So in what you describe we are fighting over who 'executes' the MoA machinery.

My limited comprehension of MoA latched on the 'lazy-eval' semantics that you could push to hardware to generate an optimal schedule given the layout and queuing that the hardware and its memory controllers generate. This is akin to the 'warp' engine on GPUs, which bundles and constrains execution threads so that they don't become to diverged in their memory paging. The hardware is the final arbiter of all these matters as it has the actual information that matters to the execution.

What Halide offers is a methodology to automatically visit the state space of possible schedules so that you can empirically measure what the best execution schedule is for a particular piece of hardware. That too fights with the idea of parallel hardware doing the scheduling, but with many tensor products there are few write/read dependencies, so the number of possible schedules is vast and their paging behavior is difficult to predict without actual execution as strides, page sizes, and concurrent pages will all vary among hw systems.

womenflyplanes commented 6 years ago

Simply answered, we are developing a new method and order of optimizations that do not exist today.

On Tue, Jun 12, 2018 at 7:12 PM, Theodore Omtzigt notifications@github.com wrote:

To add to the confusion, we are working on a hardware interface for a tensor processor where the front-end would generate the proper schedule to execute a tensor decomposition. So in what you describe we are fighting over who 'executes' the MoA machinery.

My limited comprehension of MoA latched on the 'lazy-eval' semantics that you could push to hardware to generate an optimal schedule given the layout and queuing that the hardware and its memory controllers generate. This is akin to the 'warp' engine on GPUs, which bundles and constrains execution threads so that they don't become to diverged in their memory paging. The hardware is the final arbiter of all these matters as it has the actual information that matters to the execution.

What Halide offers is a methodology to automatically visit the state space of possible schedules so that you can empirically measure what the best execution schedule is for a particular piece of hardware. That too fights with the idea of parallel hardware doing the scheduling, but with many tensor products there is very little write/read dependencies, so the number of possible schedules is vast and their paging behavior is difficult to predict without actual execution as strides, page sizes, and concurrent pages will all vary among hw systems.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/saulshanabrook/moa/issues/1#issuecomment-396763281, or mute the thread https://github.com/notifications/unsubscribe-auth/AGsmlAP_S6FUPbGDk82jxENELVl6z8w7ks5t8ErJgaJpZM4Ug9lF .

-- "Great spirits have always encountered violent opposition from mediocre minds" - Albert Einstein

saulshanabrook commented 6 years ago

To add to the confusion, we are working on a hardware interface for a tensor processor where the front-end would generate the proper schedule to execute a tensor decomposition. So in what you describe we are fighting over who 'executes' the MoA machinery.

Do you have any of this work online anywhere? I haven't used Halide at all, but just from looking at their homepage it looks like it could be possible to target it from Python, maybe with python bindings for their C++ interface, similar to how llvmlite wraps llvm's c++ interface. I think @teoliphant mentioned that they were looking at Hallide when they were thinking about Numba.

womenflyplanes commented 6 years ago

Realize that there are many tensor decompositions: LU, QR, ... Such temporal indices can also be composed and thus the optimization of a decomposition (s) with other algebra , with things like Kronecker Products, inner and outer products, etc. will look different.

On Tue, Jun 12, 2018 at 7:24 PM, Saul Shanabrook notifications@github.com wrote:

To add to the confusion, we are working on a hardware interface for a tensor processor where the front-end would generate the proper schedule to execute a tensor decomposition. So in what you describe we are fighting over who 'executes' the MoA machinery.

Do you have any of this work online anywhere? I haven't used Halide at all, but just from looking at their homepage it looks like it could be possible to target it from Python, maybe with python bindings for their C++ interface, similar to how llvmlite wraps llvm's c++ interface. I think @teoliphant https://github.com/teoliphant mentioned that they were looking at Hallide when they were thinking about Numba.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/saulshanabrook/moa/issues/1#issuecomment-396765568, or mute the thread https://github.com/notifications/unsubscribe-auth/AGsmlEq44PixTa7AS6zFDHeI4tQcLsfCks5t8E3CgaJpZM4Ug9lF .

-- "Great spirits have always encountered violent opposition from mediocre minds" - Albert Einstein

teoliphant commented 6 years ago

I just found this discussion --- hard to keep track of all the threads. When we first started with Numba, we explored halide as well and looked at it for inspiration for a halide-like extension to Numba.

My current view is that Halide could be used as a target for generating code, but has @womenflyplanes suggests, this project is exploring integrating MoA and the Psi Calculus into the SciPy ecosystem.

My current understanding is that by using "views", NumPy accidentally provides a tantalizingly close run-time for a psi-like calculus (but does not fully implement it).

It seems to me that general, variable-length arrays are a significant complication at this point, and not worth the extra distraction at this point of time --- unless a useful but more limited subset can be discovered.

I think combining MoA and Psi calculus with Sparse Arrays could be a very good approach for making a useful library in Python that would show-case the theory and get adoption.

womenflyplanes commented 6 years ago

Travis, I agree with you re:

I think combining MoA and Psi calculus with Sparse Arrays could be a very good approach for making a useful library in Python that would show-case the theory and get adoption.

The spares array extensions to the psi function have been worked out/

On Wed, Jun 13, 2018 at 12:46 AM, Travis E. Oliphant < notifications@github.com> wrote:

I just found this discussion --- hard to keep track of all the threads. When we first started with Numba, we explored halide as well and looked at it for inspiration for a halide-like extension to Numba.

My current view is that Halide could be used as a target for generating code, but has @womenflyplanes https://github.com/womenflyplanes suggests, this project is exploring integrating MoA and the Psi Calculus into the SciPy ecosystem.

My current understanding is that by using "views", NumPy accidentally provides a tantalizingly close run-time for a psi-like calculus (but does not fully implement it).

It seems to me that general, variable-length arrays are a significant complication at this point, and not worth the extra distraction at this point of time --- unless a useful but more limited subset can be discovered.

I think combining MoA and Psi calculus with Sparse Arrays could be a very good approach for making a useful library in Python that would show-case the theory and get adoption.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/saulshanabrook/moa/issues/1#issuecomment-396812087, or mute the thread https://github.com/notifications/unsubscribe-auth/AGsmlAy-Ena5hXBVEqzBp8HhGZ-Wb3M2ks5t8JkygaJpZM4Ug9lF .

-- "Great spirits have always encountered violent opposition from mediocre minds" - Albert Einstein