odlgroup / odl

Operator Discretization Library https://odlgroup.github.io/odl/
Mozilla Public License 2.0
368 stars 105 forks source link

Remove `Element` classes!? #1458

Open kohr-h opened 5 years ago

kohr-h commented 5 years ago

This is a somewhat more radical extension of #1457 to remove complexity in ODL. I'd love to have some feedback on it.


With a bit of distance to the weeds of the code, I've started thinking a bit about decisions we made (implicitly or explicitly) early on, and how well they served us so far. All in all I think we made good choices in many places, and the usefulness, flexibility and expressive power of ODL are indications of that.

At the same time, however, there are a couple of places where our solutions are more complex than they have to be. Structures like Discretization were simply mapped directly from mathematics, and sometimes there wasn't so much consideration put into the question "do we need this abstraction?" Rather, it was easy to do at the time and corresponded nicely to the math. I did that a lot.

Some of these rather innocent decisions have come back to hit us really hard. To me, the single most important case of this was the decision to introduce ...Element classes that wrap data containers. I can speak from the experience of implementing __array_ufunc__ for the Tensor class that it was a huge pain to write and test the code for all corner cases, just to make sure that Tensor is almost indistinguishable from NumPy arrays (but still different enough to be annoying quite regularly, see below). Another think I've grown increasingly annoyed with is the necessity to use .asarray() all the time, just to be sure that every array functionality works as expected and we're not subject by arbitrary limitations of Tensor, either because it's ambiguous (some cases with indexing), or just not yet implemented. And finally, all the tedious work with the CuPy backend, just to wrap it into a Tensor class and make it behave nicely, seems kind of pointless work when the array itself already works as expected.

So the big question is: Why don't we work with arrays directly?

Why don't we scrap all those element classes and use arrays directly? That would probably remove 1/4 of the whole code while touching almost no functionality. What exactly would we lose? I can't think of anything more important than these:

On the other hand, I think the benefits would be massive:

Of course, it would be quite a big change. But in my opinion, we'd benefit from it big time.

adler-j commented 5 years ago

Now this is a ballsy suggestion. I like it.

The main issue, by far, is of course the lack of x.space as you mention. But perhaps that is not too bad.

My suggestion here would be that you make a small trial branch for this and simply try it out. Then we'll see where and how vector.space is used.

ozanoktem commented 5 years ago

This sounds like a large change to the core API for ODL. Wouldn't this break a lot of code? It obviously would for parts that query the underlying space of a vector space element, right?

adler-j commented 5 years ago

It is a major change (to say the least) but it would probably make ODL much more accessible, especially to more engineering-y audiences.

kohr-h commented 5 years ago

It would be a rather large change, but my gut feeling is that it would make a lot of code simpler, while only requiring moderate adaptions to existing code.

Of course, gut feeling isn't proof. I'll have a go at it and see where the caveats lie. (In fact, I have a branch that attempts this, but at some point I lost focus and changed a lot of other things, so it isn't representative any more.)

One interesting aspect will be product spaces, in particular product space elements. Here my go-to solution would be np.array(..., dtype=object) for heterogeneous elements and an array of same type with extra dimensions for homogeneous elements.

adler-j commented 5 years ago

One interesting aspect will be product spaces, in particular product space elements. Here my go-to solution would be np.array(..., dtype=object) for heterogeneous elements and an array of same type with extra dimensions for homogeneous elements.

It's not obvious that this is preferable to e.g. lists. We have to consider that.

ozanoktem commented 5 years ago

From a conceptual point of view, it is advantageous to let the vector space element carry information about its space. This e.g. makes it easy to define a noise model that takes values on data space, we can just ask data which space it belongs to and use that when defining the noise model. If I have not misunderstood things, with the suggested changes one suddenly needs to remember the name of the data space that was once created.

kohr-h commented 5 years ago

It's not obvious that this is preferable to e.g. lists. We have to consider that.

True, although one advantage is support for multi-indexing, which we wouldn't have for nested lists.

kohr-h commented 5 years ago

From a conceptual point of view, it is advantageous to let the vector space element carry information about its space. This e.g. makes it easy to define a noise model that takes values on data space, we can just ask data which space it belongs to and use that when defining the noise model. If I have not misunderstood things, with the suggested changes one suddenly needs to remember the name of the data space that was once created.

A valid point. However, as of now, we always generate noise in a given space, not from a given element. See https://github.com/odlgroup/odl/blob/master/odl/phantom/noise.py Only exception is Poisson noise, where you might need an extra space argument, although it would also work without.

adler-j commented 5 years ago

The places where the space is implicitly passed along with a vector are e.g. show, where we use the space to give context to the values. Fixing this would not be trivial (well, except for simply adding a space.show(array)) but It might be doable.

kohr-h commented 5 years ago

Right, show would have to go into the space.

olivierverdier commented 5 years ago

I agree with the proposal of @kohr-h . I have the same gut feeling that it is a good, simplifying change, and that it will probably not break that much code. x.space was nice, but its price (in term of complexity) is probably too high.

JevgenijaAksjonova commented 5 years ago

I think it is generally bad to break backwards compatibility. Using numpy arrays directly is nice, but isn't it possible to add such possibility without removing old classes?

adler-j commented 5 years ago

I spent some time thinking about this @JevgenijaAksjonova, but frankly I cannot come up with a solution. Of course we could try to keep the old classes, but we'd have ambiguity in what we return. E.g. what should operator(array) return? Right now it returns an element, but in this case it should probably return an array.

ozanoktem commented 5 years ago

The operator(array) returning an element in its domain is very natural, but having it return a NumPy-array is not really good. Maybe I am misunderstanding something.

kohr-h commented 5 years ago

@JevgenijaAksjonova I totally agree with your general sentiment. There needs to be a good justification for a big change like this. I think once there is a working implementation of the proposal, we can see and judge, and then decide if we like it or not.

In general, I see this as a long-term investment into making ODL more maintainable and easy to take up for newcomers. We would move out of business that we should never have dealt with. It's part of the "let data just be data" philosophy from functional programming, as opposed to the "make data active by adding behavior" philosophy of object-oriented programming.

kohr-h commented 5 years ago

@ozanoktem It's up to us to decide what "element in its domain" means. We can (and would) consider arrays of correct shape and data type to be elements of a certain tensor space. There's nothing unnatural about that, and in that scenario, operators returning arrays would be perfectly okay.

ozanoktem commented 5 years ago

@kohr-h , one of the very early design goals in ODL was to make sure we kept some information on what the array represented. Just because two arrays have the same size and same datatype does not mean they necessarily represent the same vector space element. The main reason was to ensure one does not make logical errors just because they are permitted from a pure array-algebraic point of view. I suggest we do not abandon that design principle unless there is very strong reasons to do so,

sbanert commented 5 years ago

What I'm concerned about is that not having an abstract Element class makes extending ODL harder when it comes to other discretisations which do not allow the representation as a single array (like regions of interest in CT, where you have certain areas with a finer grid or Galerkin methods). This would lead to several potentially incompatible interfaces and would make it harder to write general solvers which work together with all of these interfaces. And here, I have not even mentioned things which happen in non-linear spaces like Lie groups or manifolds.

And it would also mean that any change of the implementation (like the one of ProductSpace elements) would potentially introduce backwards incompatibility.

For these reasons, I think that this change should also be seen together with a longer-term perspective of what ODL should become (Operator Discretisation Library might not only include the type of discretisations we have now).

kohr-h commented 5 years ago

Just because two arrays have the same size and same datatype does not mean they necessarily represent the same vector space element.

Why does an object that represents data have to represent more than that? Why does it have to know about spaces at all? In my view, it's sufficient that spaces know about what types of objects they can work with. The opposite direction is not necessary.

Your position reflects to a high degree how things are handled in statically typed languages. You say once and upfront what type your data is, and that's that. If you want to interpret it differently, you're forced to cast explicitly. Contrast that to the dynamic view of languages like Python. It is determined at runtime what the type of your data is, and then decided if an operation is allowed or not. That's the principle that I'm applying here as well.

If you look at how things evolved over time, we started with a very rigid type system where the boundaries between different spaces were very clearly drawn. The big downside of that is usability: if you get errors thrown in your face just because of tiny differences, you find yourself casting everywhere, which is not only potentially inefficient, but also defeats the purpose of the type system in the first place. So we started tearing down the boundaries, and now you can give an array to an operator directly, for instance (in other places it's not possible, we're not very consistent there). That was a big usability plus.

Long story short, I think the principle of "this element belongs to this space and no other space" is impractical, and we should abandon it.

The main reason was to ensure one does not make logical errors just because they are permitted from a pure array-algebraic point of view.

In my experience, the vast majority of logical errors can be avoided based on the shape of the data alone, and the next (much smaller) portion of errors by data type. Any further "help" from a type system is mainly a source of frustration because it forces manual casting.

I suggest we do not abandon that design principle unless there is very strong reasons to do so,

I have one: Wrapping data in a custom class and trying hard to make it look almost like the wrapped thing is a huge maintenance burden, and quite error prone. I can tell, since I implemented a big portion of it.

kohr-h commented 5 years ago

@sbanert I'm not suggesting to completely disallow custom element types. If there is a case where existing types don't do the job, then sure, make a custom one.

What I mean is that none of the currently implemented scenarios require such a custom type.

Regarding backwards compatibility, sure, code that does x.inner(y) or x.space or x.asarray() will have to be ported. But those are easy changes, and we can write a migration guide if necessary. I still consider the benefit to be worth the effort.

ozanoktem commented 5 years ago

Why does an object that represents data have to represent more than that? Why does it have to know about spaces at all? In my view, it's sufficient that spaces know about what types of objects they can work with. The opposite direction is not necessary.

Because certain mathematical operations do not make sense if the "vector" does not represent the same type of element. As an example, addition two 10^6-vectors where one is the Taylor series coefficients and the other is some Wavelet coefficients doesn't make sense.

Your position reflects to a high degree how things are handled in statically typed languages. You say once and upfront what type your data is, and that's that. If you want to interpret it differently, you're forced to cast explicitly. Contrast that to the dynamic view of languages like Python. It is determined at runtime what the type of your data is, and then decided if an operation is allowed or not. That's the principle that I'm applying here as well.

My position reflects the mathematical view on things, i.e., to use "typing" to ensure that mathematical operations make sense. A "dynamic view" without any data types does not care about mathematical consistency, or how would it decide whether an operation is allowed or not?

Now, mathematicians also frequently make use of the fact that a vector space element belongs to multiple spaces. As an example, one may view a continuous function with compact support as an element in L^2 at one stage, then at some other stage view it as an element in C. Certain arguments may use the L^2 topology, other arguments may be based on other topologies related to C. Obviously we cannot bookkeep such stuff, but also forget about what an array represents is in my opinion not a good choice.

My final remark concerns the community we primarily target with ODL, which in my opinion is the math community and not the engineering community. In such case, I am not convinced that the suggested changes put forward by @kohr-h are ideal.

kohr-h commented 5 years ago

Because certain mathematical operations do not make sense if the "vector" does not represent the same type of element. As an example, addition two 10^6-vectors where one is the Taylor series coefficients and the other is some Wavelet coefficients doesn't make sense.

That's a too broad statement IMO. If the two vectors have the same size and the entry type allows addition, then addition makes perfect sense. You may not want to add them because the result is not useful, but we're not here to decide for users what they ought to do. We can and should only catch obvious errors. Apart from that, there is no element in ODL that would even try to catch that type of usage error. If both operators map to rn(1e6), addition of their results is (and should be) allowed.

My position reflects the mathematical view on things, i.e., to use "typing" to ensure that mathematical operations make sense. A "dynamic view" without any data types does not care about mathematical consistency, or how would it decide whether an operation is allowed or not?

I think you're conflating "formally correct" with "meaningful". Just saying "makes sense" or "makes no sense" can be understood in both ways, and I think we need to disentangle the two. ODL never made an attempt to decide for users what operations are meaningful. We only ever checked "formal correctness", and I think it's the only thing we should be doing. Of course, one can start to push things from the "not meaningful" into the "formally incorrect" category by introducing element types. But in my view, the past has shown that this is a poor way of dealing with the issue, since it introduces a usability and maintainability nightmare.

My final remark concerns the community we primarily target with ODL, which in my opinion is the math community and not the engineering community. In such case, I am not convinced that the suggested changes put forward by @kohr-h are ideal.

I'll give an example so we're just imagining what the consequences would be. Consider the CG method. Here's the current version (without distracting details):

def conjugate_gradient(op, x, rhs, niter):
    if op.domain != op.range:
        raise ValueError('operator needs to be self-adjoint')
    if x not in op.domain:
        raise TypeError('`x` {!r} is not in the domain of `op` {!r}'
                        ''.format(x, op.domain))

    r = op(x)
    r.lincomb(1, rhs, -1, r)       # r = rhs - A x
    p = r.copy()
    d = op.domain.element()  # Extra storage for storing A x

    sqnorm_r_old = r.norm() ** 2  # Only recalculate norm after update
    if sqnorm_r_old == 0:  # Return if no step forward
        return

    for _ in range(niter):
        op(p, out=d)  # d = A p
        inner_p_d = p.inner(d)
        if inner_p_d == 0.0:  # Return if step is 0
            return
        alpha = sqnorm_r_old / inner_p_d
        x.lincomb(1, x, alpha, p)            # x = x + alpha*p
        r.lincomb(1, r, -alpha, d)           # r = r - alpha*d
        sqnorm_r_new = r.norm() ** 2
        beta = sqnorm_r_new / sqnorm_r_old
        sqnorm_r_old = sqnorm_r_new
        p.lincomb(1, r, beta, p)                       # p = s + b * p

This is what it would look like afterwards:

def conjugate_gradient(op, x, rhs, niter):
    if op.domain != op.range:
        raise ValueError('operator needs to be self-adjoint')
    if x not in op.domain:
        raise TypeError('`x` {!r} is not in the domain of `op` {!r}'
                        ''.format(x, op.domain))

    X = op.domain
    r = op(x)
    X.lincomb(1, rhs, -1, r, out=r)          # was r.lincomb(1, rhs, -1, r)
    p = r.copy()
    d = op.domain.element()

    sqnorm_r_old = X.norm(r) ** 2            # was r.norm() ** 2
    if sqnorm_r_old == 0:
        return

    for _ in range(niter):
        op(p, out=d)
        inner_p_d = X.inner(p, d)            # was p.inner(d)
        if inner_p_d == 0.0:
            return
        alpha = sqnorm_r_old / inner_p_d
        X.lincomb(1, x, alpha, p, out=x)     # was x.lincomb(1, x, alpha, p)
        X.lincomb(1, r, -alpha, d, out=r)    # was r.lincomb(1, r, -alpha, d)
        sqnorm_r_new = X.norm(r) ** 2        # was r.norm() ** 2
        beta = sqnorm_r_new / sqnorm_r_old
        sqnorm_r_old = sqnorm_r_new
        X.lincomb(1, r, beta, p, out=p)      # was p.lincomb(1, r, beta, p)

I think this code makes it actually more clear in what spaces we're operating. Here, domain and range are identical, but when they're not, the code much more clearly shows which operation happens in which space.

JevgenijaAksjonova commented 5 years ago

I spent some time thinking about this @JevgenijaAksjonova, but frankly I cannot come up with a solution. Of course we could try to keep the old classes, but we'd have ambiguity in what we return. E.g. what should operator(array) return? Right now it returns an element, but in this case it should probably return an array.

What about returning an array if the array is given as input and space element in the opposite case?

adler-j commented 5 years ago

What about returning an array if the array is given as input and space element in the opposite case?

The problem is that this would break current code, where we always return a space element, regardless of if the input is an array or space element.

JevgenijaAksjonova commented 5 years ago

What about returning an array if the array is given as input and space element in the opposite case?

The problem is that this would break current code, where we always return a space element, regardless of if the input is an array or space element.

Sure, there would be some damage, but smaller. So it still could be a good compromise. The biggest problem I see, is that it will complicate development in the future, as one will have to think about these 2 options. =/

kohr-h commented 5 years ago

Sure, there would be some damage, but smaller. So it still could be a good compromise. The biggest problem I see, is that it will complicate development in the future, as one will have to think about these 2 options. =/

To me, the main issue with this would be to guarantee that op(x) in op.range is always True. If we don't guarantee this, we'll be causing a lot of confusion. But to guarantee this would mean to extend x in space to arrays x, and then we'd basically be very close to my proposal.

aringh commented 5 years ago

I am torn in this discussion: from one perspective it is good to reduce the amount of code that needs to be maintained. On the other hand, I tend to also agree with @ozanoktem @sbanert: I like the idea that the elements "know what they are". For example if one wants to implement a discretization.

You can kind of read it between the lines in some places, but I would like to know what the principle in the original design was? Why was all this mathematical abstraction implemented? What purpose was it actually supposed to fill?

From the write-ups by @kohr-h, it seems to currently fill very little purpose. Is that because we are not utilizing all the potential that it gives? Or simply because it does not make much of a difference if you have it or not?

adler-j commented 5 years ago

In response to @aringh, we initially envisioned the vector spaces and elements to be quite lightweight, e.g. that they should basically satisfy the vector space axioms (scalar multiplication, addition, possibly an inner product).

However we quickly found that a significant part of implementations go beyond this. Users want to be able to work with the elements as arrays in several regards, e.g. pointwise multiplications, taking e.g. the pointwise exponential, slicing, etc. In the end we've spent A LOT of time to add these features, but meanwhile the objects have become more and more like their underlying implementations.

Frankly I do not doubt that the elements in themselves carry a lot of value, but the implementation is very cumbersome as is given that we basically need to write passthroughs to every single operation that our users want to do.

Perhaps we should look into subclassing instead, e.g. https://docs.scipy.org/doc/numpy-1.13.0/user/basics.subclassing.html

adler-j commented 5 years ago

Frankly, the more I think about it the more I wonder. Why did we not go with sub-classing from the start? I seem to remember that we did some initial forages into this but stopped, although I cannot remember why.

ozanoktem commented 5 years ago

In response to @aringh, there were two aims with ODL: (a) Enable users to write code that directly corresponds to the mathematical specification of an algorithm, and (b) offer means for users to share software components and/or implementations. A key aspect of code sharing was to stress compartmentalisation with associated documentation and unit tests, another was to separate out the discretisation from the continuum representation.

Another aspect was to offer consistency checks to prevent erroneous mathematical operations even though they are permitted from a computer science point of view. This is the source of the discussions right now concerning what information a vector space element is suppose to carry. The few times I have used ODL, it has been quite helpful to query an operator regarding its range to ensure it is properly set up. If this ability is lost, i.e., if the range is just some NymPy array, then I think we have lost important functionality.

adler-j commented 5 years ago

If this ability is lost, i.e., the range is just some NymPy array

This would not happen under this suggestion. What would happen would be that the range would be a space just as it is right now (e.g. with extents and all that), but that the elements of the space would simply be numpy arrays.

kohr-h commented 5 years ago

Frankly, the more I think about it the more I wonder. Why did we not go with sub-classing from the start? I seem to remember that we did some initial forages into this but stopped, although I cannot remember why.

I think at the time, subclassing was largely discouraged due to annoyances like conversions to raw Numpy arrays by certain functions. Maybe the status has changed in the meanwhile.

kohr-h commented 5 years ago

but that the elements of the space would simply be numpy arrays.

Or any other type of array, for that matter.

adler-j commented 5 years ago

I think at the time, subclassing was largely discouraged due to annoyances like conversions to raw Numpy arrays by certain functions. Maybe the status has changed in the meanwhile.

It seems the __array_ufuncs__ is able to solve this for some things (basically the ones we've currently solved it for). So maybe it's easier now.

We might want to look into this option as a form of "middle ground" so to say.

kohr-h commented 5 years ago

It seems the __array_ufuncs__ is able to solve this for some things (basically the ones we've currently solved it for). So maybe it's easier now.

We might want to look into this option as a form of "middle ground" so to say.

There's a broader __array_function__ that will land in Numpy 1.17. It would also work for functions like np.sum that currently don't respect subclasses or custom types. See, for instance, here: https://github.com/numpy/numpy/pull/11189

kohr-h commented 5 years ago

I like the idea that the elements "know what they are".

From the write-ups by @kohr-h, it seems to currently fill very little purpose. Is that because we are not utilizing all the potential that it gives? Or simply because it does not make much of a difference if you have it or not?

I would say that currently, elements only know what they are in a very superficial sense. They know that they live in some vector space with a bunch of properties (shape, data type, maybe weighting). That's it. They have no way of knowing what they actually represent, for instance the result of a wavelet transform. To keep track of that is, and always was, on the user.

My argument is that this tiny amount of extra information stands in no relation to the vast amount of code to make this work transparently with e.g. Numpy. In some sense, the middle ground is not a good place here.

There are two ways out (in my view): either make elements much more powerful or abandon them altogether. (I'm open to discussing a reasonable third way.)

Making elements more powerful would be able deliver what @ozanoktem was arguing for with the wavelet coefficient example. An element would somehow know that it came from a wavelet transform, so that it cannot be added to an element of the same size and data type, but that came from a shearlet transform, for instance. This scenario would probably require to implement the range of each operator as a separate space. Some may share spaces, but who knows.

Removing elements is simpler to explain: they'd just be gone.

What would be the consequences?

I think the first scenario with elements on steroids would be absolutely terrible. First of all, we as developers would set ourselves up as judges over what's allowed and what's not in the world of operators. And it'd be so much work that we wouldn't do anything else anymore (well, I would, because I wouldn't help making such a monster). As a bonus, ODL would become virtually unusable due to its extreme rigidity.

For the other scenario, I think I've discussed the likely consequences in earlier posts, and I'm not gonna repeat myself. After all, I think only a working implementation can sort this out properly.

adler-j commented 5 years ago

(I'm open to discussing a reasonable third way.)

I'll once again raise the option of subclassing e.g. ndarray. Perhaps we should start out by trying that on a branch and see how it looks. It feels like it should solve our major issues here:

kohr-h commented 5 years ago

I've started looking into the subclassing option, out of curiosity. Here's again the reference documentation:

https://docs.scipy.org/doc/numpy/user/basics.subclassing.html

TL;RL


A naive implementation looks like this (similar to the RealisticInfoArray case):

class NumpyTensor(Tensor, np.ndarray):

    def __new__(cls, data, space=None):
        arr = np.asarray(data).view(cls)
        arr.__space = space
        return arr

    # To avoid calling the single-argument __init__ of LinearSpaceElement
    def __init__(self, data, space):
        pass

    def __array_finalize__(self, obj):
        if obj is None:
            return
        self.__space = getattr(obj, "space", None)

    @property
    def space(self):
        return self.__space

The rest of the code in the class is basically gone, along with 90 % of the contents of Tensor (everything except impl and show).

With some minor adaptions in NumpyTensorSpace.element, element creation and casting to and from Numpy arrays work, which is a good start. :+1:

>>> X = odl.rn(3)
>>> x = X.element([1, 2, 3])
>>> x
NumpyTensor([ 1.,  2.,  3.])
>>> x.space
rn(3)
>>> a = np.ones(4)
>>> y = a.view(X.element_type)
>>> y
NumpyTensor([ 1.,  1.,  1.,  1.])
>>> y.space  # None, oops

Note the little annoyance with printing. It kind of works, but doesn't convey any information about the space, in contrast to before. Also the output of repr is not runnable, which is not a must-have, but a nice-to-have.

The bigger issue with the missing space, though, can be fixed by some guessing:

class NumpyTensor(Tensor, np.ndarray):
    ...

    def __array_finalize__(self, obj):
        if obj is None:
            return

        if hasattr(obj, "space"):
            self.__space = obj.space
        else:
            self.__space = NumpyTensorSpace(obj.shape, obj.dtype)

Now we get this:

>>> X = odl.rn(3)
>>> y = np.ones(4).view(X.element_type)
>>> y.space
rn(4)

Looks good, apart from the fact that we can only use the very basic properties shape and dtype. That may pose problems downstream.

Now that we have a proper ndarray, there are lots of new methods to try. And that's where the trouble starts:

>>> X = odl.rn(3)
>>> y = np.arange(4).view(X.element_type)
>>> y.space
tensor_space(4, dtype=int)
>>> y_float = y.astype(float)
>>> y_float  # looks ok
NumpyTensor([ 0.,  1.,  2.,  3.])
>>> y_float.space  # but isn't
tensor_space(4, dtype=int)

>>> X = odl.rn((2, 3))
>>> x = X.one()
>>> x_flat = x.ravel()
>>> x_flat
NumpyTensor([ 1.,  1.,  1.,  1.,  1.,  1.])
>>> x_flat.space  # uh, oh
rn((2, 3))

That's unfortunate, although not very surprising, given the little effort spent. We can do a little change to sort this issue out:

class NumpyTensor(Tensor, np.ndarray):
    ...

    def __new__(cls, data, space=None):
        arr = np.asarray(data).view(cls)
        if space is None:
            arr.__space = None
        else:
            arr.__space = NumpyTensorSpace(arr.shape, arr.dtype)
        return arr

    def __init__(self, data, space):
        pass

    def __array_finalize__(self, obj):
        self.__space = NumpyTensorSpace(self.shape, self.dtype)

Now we have this:

>>> X = odl.rn(3)
>>> y = np.arange(4).view(X.element_type)
>>> y.space
tensor_space(4, dtype=int)
>>> y_float = y.astype(float)
>>> y_float
NumpyTensor([ 0.,  1.,  2.,  3.])
>>> y_float.space  # fixed
rn(4)

>>> X = odl.rn((2, 3))
>>> x = X.one()
>>> x_flat = x.ravel()
>>> x_flat
NumpyTensor([ 1.,  1.,  1.,  1.,  1.,  1.])
>>> x_flat.space  # fixed
rn(6)

This now looks okay and something one could work with. It's slightly worrying that we have to purely rely on numpy.ndarray properties to determine the space, but with a bit more code, this could be controlled in a more fine-grained way. For instance, what happens with weightings after ravel()? There is some code in another branch for that kind of case.

Another slight red flag is the multiple inheritance of NumpyTensor from Tensor and numpy.ndarray. (Removing Tensor wouldn't help since we'd still have LinearSpaceElement as parent.) That may just lead to "interesting" behavior down the line.

And finally the biggest worry. This is all good for elements based on Numpy arrays, but doesn't help at all for other array types (Numba arrays, CuPy arrays etc.). I'm pretty sure that their subclassing API is less sophisticated than Numpy's and will blow up in our faces much more often. The only thing I can imagine working there is to make a custom subclass of numpy.ndarray for those cases as well, where virtually everything is swapped out behind the scenes (the buffer, routines for indexing, catching of incompatible operations, etc.). So for those cases, we probably don't save any work and may, on the contrary, have to do more work because the numpy.ndarray API surface is much bigger than the one of our element classes. For array types whose API is not compatible with Numpy's, some of that work has to be done anyway, but with a numpy.ndarray we basically go "all in" without a lot of chance for gradual support of operations. A subclass should be expected to support the complete API of the parent, and offering a narrower API will be annoying for users who then don't know any more what works and what doesn't. In particular since all methods will be there, just not working, unless we do some black magic to remove them. (That will also not be easy since numpy.ndarray is implemented in C.)

So on the surface, this approach looks okay, but I still have big reservations.

adler-j commented 5 years ago

First, a somewhat hacky-yet-useful proposal would be to not expose NumpyTensor at all. Instead we make it a hidden class inside of NumpyTensorSpace that can only be accessed through X.element_type. As a short example:

class NumpyTensorSpace(LinearSpace):

    ...  # stuff

    @property
    def element_type(self):
        space = self
        # should probably memoize here
        class NumpyTensor(Tensor, np.ndarray):

            …  # stuff

            @property
            def space(self):
                return space

            def __repr__(self):
                return repr(space) + ‘.element_type’

        return NumpyTensor

This way, there would be no need to provide space to the constructor and should solve most (all?) of the implementation-wise problems you mentioned above.

With that said, this does not solve the problem of e.g. copy arrays which are not as sophisticated, I’ll have to agree that they pose a much harder problem.

kohr-h commented 5 years ago

This way, there would be no need to provide space to the constructor and should solve most (all?) of the implementation-wise problems you mentioned above.

That's a nice idea. To properly support ravel() and the likes, you'll still need a dynamic property that can be modified by __array_finalize__:

    @property
    def element_type(self):
        """Type of elements in this space: `NumpyTensor`."""
        space = self

        class NumpyTensor(Tensor, np.ndarray):

            """Representation of a `NumpyTensorSpace` element."""

            def __new__(cls, data):
                arr = np.asarray(data).view(cls)
                arr.__space = space
                return arr

            def __init__(self, data):
                pass

            def __array_finalize__(self, obj):
                self.__space = NumpyTensorSpace(self.shape, self.dtype)

            @property
            def space(self):
                return self.__space

        return NumpyTensor

With that said, this does not solve the problem of e.g. copy arrays which are not as sophisticated, I’ll have to agree that they pose a much harder problem.

I'll try to look around if other people have found more or less elegant solutions for subclassing with non-trivial changes to the internal structure.

olivierverdier commented 5 years ago

Alternative proposal: use xarray.DataArray. This is a thin wrapper around a numpy array, which still retains some of the metadata that odl spaces usually encode. Here is a simple possible implementation (that I use all the time now, because it's so easy to get a plot from an xarray):

def coords_from_space(space):
    coords = [axis.points().squeeze(axis=-1) for axis in space.partition.byaxis]
    return coords

def xarray_from_element(element):
    coords = coords_from_space(element.space)
    xarr = xr.DataArray(element, coords=coords, dims=element.space.axis_labels, name='Intensity')
    return xarr

Once this is done, you can also basically scrap all the (huge, kludgy) code for the show method and plot the DataArray directly using various methods outside odl.

kohr-h commented 5 years ago

Hm, interesting suggestion. I'm hesitant though to add an extra dependency. What's also a bit unfortunate is the lack of support for array types other than numpy.ndarray and pandas.DataFrame. I still have hopes that the CuPy backend will land at some point. :wink:

For plotting, xarray seems like a more mature solution and functionally at least as powerful, compared to our homemade code. But then, that's too little to make it a core dependency, which means the old plotting code would have to stick around...

olivierverdier commented 5 years ago

It's not my intention to push further for xarray, but I'd like to add some clarifications:

xarray supports both numpy and dask as backends, but not cupy indeed. However, there are general endeavours to facilitate code that uses any NumPy-ish kind of backend (just like you would like to use both cupy and numpy within odl): http://www.numpy.org/neps/nep-0018-array-function-protocol.html Once this NEP is properly implemented, nothing will prevent neither ODL nor xarray to use cupy as a backend instead of numpy.

For plotting, xarray seems like a more mature solution and functionally at least as powerful, compared to our homemade code.

Why, that is quite an understatement... As a DataArray is a well established array type, with coordinates, dimension names, etc., it can be plotted using various plotting libraries without any further effort. (For instance visualised interactively or not, accessed by indices or coordinates, with automatic 2D slices if it's in 3D, etc., etc.). I still think that odl's show method should be removed in favour of this.

That being said, I understand the reluctance to depend on an external library.

kohr-h commented 5 years ago

xarray supports both numpy and dask as backends, but not cupy indeed.

Oh yes, I forgot about dask. I also saw at least one open issue on the xarray issue tracker about generalization of the array backend. The developers acknowledged that many operations currently hard-code the officially supported backends (likely with if...else), and that changing that requires a non-trivial effort. It's hard to judge how big of a priority this is for them. Probably xarray one more of these super-useful libraries with very low bus factor.

Why, that is quite an understatement...

:smile: I need to somehow shed a positive light on our own code.

I still think that odl's show method should be removed in favour of this.

That would certainly be an option if we become (largely) array-type-agnostic. Some prominent and easy-to-find examples of how to display data would be a natural addition in that case. Another option would be -- since show anyhow only works with the extra matplotlib dependency -- to add a codepath that is used if xarray is installed. (A conversion to numpy.ndarray would be required, though.)

kohr-h commented 5 years ago

Once this NEP is properly implemented, nothing will prevent neither ODL nor xarray to use cupy as a backend instead of numpy.

True, and probably the CuPy people are among the first to adopt that new interface (I vaguely remember that they already have a PR on it.) On the other hand, it might not be the be-all-end-all solution. In the past, CuPy would by default disallow implicit CPU <-> GPU data transfers since they can lead to hard-to-find performance bottlenecks (which I think is a reasonable position).

olivierverdier commented 5 years ago

The developers acknowledged that many operations currently hard-code the officially supported backends (likely with if...else), and that changing that requires a non-trivial effort. It's hard to judge how big of a priority this is for them. Probably xarray one more of these super-useful libraries with very low bus factor.

It is not necessarily that bad. They have an interface to handle dask and numpy simultaneously. (But I haven't looked in details.)

True, and probably the CuPy people are among the first to adopt that new interface (I vaguely remember that they already have a PR on it.)

Exactly. Actually, it's already merged: https://github.com/cupy/cupy/pull/1650