sunpy / ndcube

A base package for multi-dimensional contiguous and non-contiguous coordinate-aware arrays. "Maintainers": @danryanirish & @Cadair
http://docs.sunpy.org/projects/ndcube/
BSD 2-Clause "Simplified" License
44 stars 47 forks source link

Adds the functionality of Ufunc to NDCube #666

Open ViciousEagle03 opened 6 months ago

ViciousEagle03 commented 6 months ago

PR Description This PR address the issue: #591

I've tried to add the support of Arithmetic operations to NDCube using Numpy Ufunc I have removed the test cases in ndcube/tests/test_ndcube.py in the arithmetic section which doesn't support __array_ufunc__

Note: Before proceeding to add the remaining test cases and Changelog, I wanted to ensure that the proof of concept is correct.

DanRyanIrish commented 6 months ago

Thanks @ViciousEagle03 for this PR and for opening it earlier rather than later so we can discuss the approach.

Firstly, I'll say that I'm not an expert in ufuncs so it's possible I've misunderstood some details in your implementation. Secondly, despite my concerns below, I think supporting ufuncs is a useful feature for users. The question is how to do this, and this PR is a great chance to discuss it.

My understanding is that the API you're implementing for ufuncs is different to that for the arithmetic operators. The ufuncs basically extract the data array and then proceed as normal, whereas the arithmetic operations try to take the uncertainties and mask into account, where appropriate. For example, cube + np.ones(cube.data.shape) will return an NDCube where all elements in the data array have been incremented by 1 (same WCS, uncertainties, mask, etc.), whereas np.add(cube, np.ones(cube.data.shape)) will return a numpy array giving the result of cube.data + np.ones(cube.data.shape). Is this correct?

Assuming that it is, this opens an important discussion. My intuition as a user would be that np.add and + should behave the same when adding an array to an NDCube. By this I mean both should consider the WCS, uncertainties and mask if and where appropriate, and return an NDCube. If users are only interested in the data array without consideration of the WCS, uncertainty and mask, a simple and explicit API is already available, namely np.add(cube.data, np.ones(cube.data.shape)). I would argue the same applies to np.subtract and np.multiply. What do you think? Is there another perspective I'm not considering?

If we continue to assume that ufuncs should also consider the wcs, mask and uncertainty, it has implications for np.maximum, np.equal as well. Specifically to your question about whether np.equal should take into account the wcs, I would argue yes. In fact this is a major reason it has not yet been implemented. Determining whether two APE-14 wcs's are equivalent is not always trivial, or it can be computationally expensive. Another consideration is whether values in the data array should be the same, or just have overlapping uncertainty ranges. Furthermore, should masked values be excluded from the calculation? I would argue yes. But these are questions that we haven't agreed a consensus to.

As for np.maximum, I gave this some thought a while back and asked around to get a sense of what users' intuitions would be. Again, the handling of uncertainties means that the behaviour is ambiguous. Should the maximum be the maximum value in the data array? Or the maximum of data + uncertainty? Or even the maximum of data - uncertainty? Perhaps it should return both the maximum value and the uncertainty associated with that. And the uncertainty could be asymmetric if drawn from multiple elements in the NDCube. I realised that there wasn't an obvious, unified expectation from users, and so we decided not to implement maximum. But that isn't to say that it shouldn't be. But it is more complicated than simply np.maximum(cube.data)).

Fundamentally, the issue lies in how to treat the wcs, mask and uncertainty, whether it be arithmetic operators or ufuncs. If these are going to be ignored and a numpy array returned, users can easily perform their operation on cube.data, and there's no real need to implement operators on NDCube.

What are your thoughts? And have I misunderstood your implementation?

ViciousEagle03 commented 6 months ago

Hey @DanRyanIrish, Thanks for the feedback I think both the ufunc and arithmetic operators APIs should provide the same implementation for the ufuncs and arithmetic operators and so I have changed the __array_ufunc__ method to take in account the same.The ufuncs now also considers the uncertainties and mask where necessary. Regarding the implementation of np.equal ufunc, I think it is ambiguous to define the implementation of it Given the computational expense of comparing the WCSes of two NDCube ,if one is only comparing the data array of the cubes then it can be achieved by just np.equal(cube1.data, cube2.data) or if equal means comparing the reference of both the cubes then we can check it by is.So, you are right to first establish what the np.equal should achieve and if you have any particular implementation you would like me to add to np.equal, I would be happy to change the patch accordingly. I'm a bit uncertain about np.maximum, As far as I understand it, np.maximum typically operates as a binary ufunc, changing the convention of np.maximum could be misleading or confusing, Or did I misunderstood the above? And if we implement the np.maximum, we are returning the two things below, right?

ViciousEagle03 commented 6 months ago

Hey @DanRyanIrish, Apologies for the delay, I have introduced the kwarg dunder to handle the operator overloading properly and to leverage the existing python operators. Since there is little documentation available regarding the implementation of array_ufunc functionality for complex custom objects so the way i have implemented it might be a little confusing so i will try to explain the API I have implemented here, If array2 supports array_ufunc • Calling np.add(cube1, array2) would return an NDCube. • Attempting np.add(cube1, cube2) would return NotImplented, triggering a TypeError. • Calling np.add(array1, cube2) would also return an NDCube. By default, ufuncs triggers the array_ufunc mechanism, while arithmetic operators invoke the respective dunder methods. To ensure both ufuncs and arithmetic operators account for masks and uncertainties appropriately and follow the same API, I've used dunder to specify when an arithmetic operator is being used. For example in case of, np.add(cube1, array2), the control proceeds as below: array_ufunc()[dunder=false] --> __add__() --> array_ufunc() [dunder=True] --> now it proceeds similarly as + operator. All numpy ufuncs leverage the respective existing dunder methods, which ensures the implementation is consistent.