radio-astro-tools / spectral-cube

Library for reading and analyzing astrophysical spectral data cubes
http://spectral-cube.rtfd.org
BSD 3-Clause "New" or "Revised" License
96 stars 63 forks source link

Fixed Improper Method Call: Replaced `NotImplementedError` #896

Closed fazledyn-or closed 10 months ago

fazledyn-or commented 10 months ago

Details

While triaging your project, our bug fixing tool generated the following message(s)-

In file: spectral_cube.py, class: BaseSpectralCube, there is a special method __floordiv__ that raises a NotImplementedError. If a special method supporting a binary operation is not implemented it should return NotImplemented. On the other hand, NotImplementedError should be raised from abstract methods inside user defined base classes to indicate that derived classes should override those methods. iCR suggested that the special method __floordiv__ should return NotImplemented instead of raising an exception. An example of how NotImplemented helps the interpreter support a binary operation is here.

Related Documentation

Changes

Previously Found & Fixed

CLA Requirements

This section is only relevant if your project requires contributors to sign a Contributor License Agreement (CLA) for external contributions.

All contributed commits are already automatically signed off.

The meaning of a signoff depends on the project, but it typically certifies that committer has the rights to submit this work under the same license and agrees to a Developer Certificate of Origin (see https://developercertificate.org/ for more information). - Git Commit SignOff documentation

Sponsorship and Support

This work is done by the security researchers from OpenRefactory and is supported by the Open Source Security Foundation (OpenSSF): Project Alpha-Omega. Alpha-Omega is a project partnering with open source software project maintainers to systematically find new, as-yet-undiscovered vulnerabilities in open source code - and get them fixed – to improve global software supply chain security.

The bug is found by running the Intelligent Code Repair (iCR) tool by OpenRefactory and then manually triaging the results.

keflavich commented 10 months ago

I'm a bit confused by this recommendation. Why isn't NotImplentedError acceptable? I don't see an explanation.

Also, though, in this case, the recommendation they give is:

Note It should not be used to indicate that an operator or method is not meant to be supported at all – in that case either leave the operator / method undefined or, if a subclass, set it to [None](https://docs.python.org/3/library/constants.html#None).

i.e., __floordiv__ should be set to None, not return NotImplemented. The latter is if there is a type-specific NotImplemented.

fazledyn-or commented 10 months ago

The reason behind using NotImplemented instead of NotImplementedError is according to the documentation.

Let's say we have __eq__ method in a MyClass class that raises NotImplementedError. In that case, the following code would result in an exception -

a = MyClass()
b = a
print(a == b)

Whereas, the correct output should've been True. If we change NotImplementedError to NotImplemented, the Python interpreter does the comparison for us.

Also, using NotImplemented results in TypeError if the operation isn't supported at all. Whereas using None results in None, with no extra messages that helps the developer.

Running the following program gives us the output -

class Foo:
    def __floordiv__(self, other):
        raise NotImplementedError("Floor-division is not supported")

class Bar:
    def __floordiv__(self, other):
        None     # or pass

class Tone:
    def __floordiv__(self, other):
        return NotImplemented

try:
    a = Foo()
    b = Foo()
    print(a // b)
except Exception as e:
    print(e)

try:
    x = Bar()
    y = Bar()
    print(x // y)
except Exception as e:
    print(e)

try:
    m = Tone()
    n = Tone()
    print(m // n)
except Exception as e:
    print(e)
Floor-division is not supported
None
unsupported operand type(s) for //: 'Tone' and 'Tone'

Which is why, I think using NotImplemented rather than NotImplementedError or None is a better option.

keflavich commented 10 months ago

Alright, I agree with your view on NotImplented vs None: NotImplemented is better than None, and the recommendation in the python docs is confusing (why do they suggest None?).

I'm still not convinced I see the case for NotImplemented over raising an exception; even in the case you gave above, it's not clear that a==b should be True (I think it is True because you're actually doing NotImplemented == NotImplemented?).

In any case, I'm happy enough to switch to NotImplemented here, but the reasoning still eludes me.

fazledyn-or commented 10 months ago

I'm still not convinced I see the case for NotImplemented over raising an exception; even in the case you gave above, it's not clear that a==b should be True (I think it is True because you're actually doing NotImplemented == NotImplemented?).

Umm, I don't think so.

Putting return NotImplemented for __eq__ method for MyClass hands over the comparison to the Python interpreter. Then the interpreter compares it (a and b are the same variable, just different identifiers). For example, we can run the following code -

class Foo:
        def __eq__(self, other):
                return NotImplemented

class Bar:
        def __eq__(self, other):
                return NotImplemented

a = Foo()
b = Bar()
print(a == b)
False

Even though both class are returning NotImplemented, they aren't equal at all. Because the underlying Python interpreter knows that they're of different types. Which isn't actually what you thought NotImplemented does.

keflavich commented 10 months ago

OK, that's helpful.

I suppose the best default is for an object to equal itself. There are exceptional cases where you might not want that (e.g., nan), but the example above isn't one of them.

keflavich commented 10 months ago

The tests need to be updated to accommodate the change in behavior, since we were testing for NotImplementedError before.

fazledyn-or commented 10 months ago

Okay, I'll update the tests ASAP and push.

fazledyn-or commented 10 months ago

I've updated the tests. However, the test test_floordiv_fails kept failing, as a result- I had to remove the pytest context for it. Please have a look and let me know if it's correct or not.

fazledyn-or commented 10 months ago

I found the problem. Let's discuss it using an example-

import numpy as np
from astropy.units.quantity import Quantity
from astropy.wcs import WCS
from astropy import units as u
from spectral_cube.spectral_cube import BaseSpectralCube

data = np.arange(4)[:, None, None, None] * np.ones((5, 20, 30))
wcs = WCS(naxis=3)
wcs.wcs.ctype = ['RA---TAN', 'DEC--TAN', 'FREQ']

cube = BaseSpectralCube(data=data[0], wcs=wcs)
quant = Quantity(value=1.0)
out = cube // quant
print(type(out))

After running this program, we get output, meaning that the __floordiv__ operation for BaseSpectralCube is sucessful. The same happens for SpectralCube anyway.

After using pdb like below-

# ...
quant = Quantity(value=1.0)
import pdb
pdb.set_trace()

out = cube // quant
print(type(out))

We can see that after getting returned NotImplemented from __floordiv__ method, the operation is being handled by __array_ufunc__ method of astropy.units.quantity.py at line 621.

> /home/ataf/BFAS/Source/spectral-cube/floordiv_test.py(22)<module>()
-> out = cube // quant
(Pdb) s
--Call--
> /home/ataf/BFAS/Source/spectral-cube/spectral_cube/spectral_cube.py(2298)__floordiv__()
-> def __floordiv__(self, value):
(Pdb) s
> /home/ataf/BFAS/Source/spectral-cube/spectral_cube/spectral_cube.py(2299)__floordiv__()
-> print("return NotImplemented")
(Pdb) s
return NotImplemented
> /home/ataf/BFAS/Source/spectral-cube/spectral_cube/spectral_cube.py(2300)__floordiv__()
-> return NotImplemented
(Pdb) s
--Return--
> /home/ataf/BFAS/Source/spectral-cube/spectral_cube/spectral_cube.py(2300)__floordiv__()->NotImplemented
-> return NotImplemented
(Pdb) s
--Call--
> /home/ataf/BFAS/Source/spectral-cube/env/lib/python3.8/site-packages/astropy/units/quantity.py(621)__array_ufunc__()
-> def __array_ufunc__(self, function, method, *inputs, **kwargs):
(Pdb) s
> /home/ataf/BFAS/Source/spectral-cube/env/lib/python3.8/site-packages/astropy/units/quantity.py(640)__array_ufunc__()

I think it has something to do with Numpy's Behavior in combination with Python’s binary operations. After putting print statement inside __array_ufunc__ method of astropy.units.quantity.py, we can see that function = <ufunc 'floor_divide'> and method = __call__ are being passed. Since the Quantity class (more like numpy) is taking care of the __floordiv__ operation here, even putting NotImplemented in SpectralCube isn't stopping the operation.

So, why was it working before?

Before these changes, the __floordiv__ method of BaseSpectralCube used to "raise" NotImplementedError exception which stopped the operation there. By putting NotImplemented instead, we allowed the Python interpreter to try out more options.

Walkthrough Video: WT - spectral-cube - NotImplemented - Test Error - pytest.mp4

keflavich commented 10 months ago

Thanks for the deep investigation @fazledyn-or . What do you recommend as the next step? As long as SpectralCube inherits from Quantity, it looks like return NotImplemented is not safely guaranteeing a true NotImplemented response. The safest thing, therefore, seems to be to return to the original raise NotImplementedError. Is there an alternative?

The return value being a Quantity is something we explicitly don't want. First, it drops all of the important metadata. It also carries the risk of loading the whole cube's data into memory, which is explicitly counter to the main aim of this package.

codecov[bot] commented 10 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (9847cf1) 79.95% compared to head (08f2df0) 79.95%. Report is 3 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #896 +/- ## ======================================= Coverage 79.95% 79.95% ======================================= Files 24 24 Lines 6016 6016 ======================================= Hits 4810 4810 Misses 1206 1206 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

fazledyn-or commented 10 months ago

I suggest that the changes proposed by me should be ignored as a "special case". Even I had no idea about the '__arrayufunc_\' method and its effect on Python's special method. Since this project requires extreme precision over data and operations, we should go back to "raise NotImpledmentError".

TLDR: This PR should be closed since it's of no use now.

Thank you for your kind support.

keflavich commented 10 months ago

Thank you for the deep dive @fazledyn-or! Sorry this turned out not to be the right solution, but I learned something and this is a useful record of specifically why we need to use this nonstandard approach.