Lazy XND container - Githubissues

xnd-project / libxnd

Subsumed into xnd

https://xnd.io/

BSD 3-Clause "New" or "Revised" License

80 stars 12 forks source link

Lazy XND container #41

Closed hameerabbasi closed 5 years ago

hameerabbasi commented 5 years ago

It might be nice to have a lazy XND container (particularly for indexing) instead of an eager one. My idea is the following, given and xnd array:

x = xnd([[0, 1], [3, 4, 5], [6, 7, 8, 9], [10, 11, 12, 13, 14]], type='var * var * int64')

When we do the following: y = x[2:4, 1:2], we should index with the first part immediately but store the second slice as belonging to axis 1 in something like an XNDLazy object. Then, when we do something like the following:

y[1, 0] should "dynamically" realise that we're operating on a lazy object and return xnd(11, type='int64'), applying the slice when the element is needed or accessed.

dhirschfeld commented 5 years ago

Isn't this the goal of uarray?

hameerabbasi commented 5 years ago

The issue was that indexing a fixed dimension array usually produces a view, but that of a variable dimension array recomputes the offsets under some conditions. This was a proposed solution.

skrah commented 5 years ago

To me that looks exactly like the existing implementation:

https://github.com/plures/ndtypes/blob/4d810d0c4d54c81a7136f313f0ae6623853d574a/libndtypes/ndtypes.c#L1501

There's a slice stack that is applied on access, I guess that's what you mean by lazy. I'm currently adding indices to that because @dhirschfeld has a use case.

hameerabbasi commented 5 years ago

Is applying a slice stack better for cases when you don’t need complete slices?

skrah commented 5 years ago

You need the stack because multiple slices can be applied on top of each other.

y = x[2:10] # Push first slice onto the stack.
z = y[20:30] # Push second slice onto the stack.
z[3] # Apply each slice of the stack and return the element.

You always have to calculate start, stop, step even if just to see that an index is actually valid.

hameerabbasi commented 5 years ago

Can you do the following though:

y = x[1:30, 4:50]
z = y[2:10, 4:20]
z[5, 6]

skrah commented 5 years ago

Yes, of course:

>>> x = xnd([[1,2,3], [4,5,6,7], [8,9,10,11,12]])
>>> y = x[1:2, 0:3]
>>> z = y[:, 0:2]
>>> z[0, 1]
xnd(5, type='int64')

skrah commented 5 years ago

To others who are reading this: We are talking about ragged arrays, fixed arrays (like NumPy's ndarray or XND's fixed dimension) do not require this additional code.

skrah commented 5 years ago

The fundamental difference between the slice stack and the XNDLazy object seems to me that for multiple slices on top of each other one would need multiple XNDLazy types:

var * lazy * lazy * var * int64

This messes up type->ndim and complicates the code. The slice stack is attached to the variable dimensions with the correct type->ndim.

For indices the story is a bit different: Yesterday I added an Index type (privately) to implement @dhirschfeld's request and also address Eric Wieser's comments about mixed indexing and slicing.

But there can be at most one Index type in a row, because that dimension is eliminated. Even then, it's a bit of a hassle to adjust type->ndim, because now we have a shadow dimension.

skrah commented 5 years ago

Closing, this works now:

>>> x = xnd([[0], [1,2], [3,4,5]])
>>> x[::-1, 0]
xnd([3, 1, 0], type='var * int64')

The internal representation is the existing slice stack together with a new VarDimElem type that stores the index for eliminated dimensions that are preceded by a slice.