yt-project / unyt

Handle, manipulate, and convert data with units in Python
https://unyt.readthedocs.io
BSD 3-Clause "New" or "Revised" License
364 stars 48 forks source link

Issue with unyt comparison methods via numpy 1.25.0 #431

Open CalCraven opened 1 year ago

CalCraven commented 1 year ago

Description

Comparing unyt arrays to strings has been deprecated by 1.25.0 release of numpy. According to the patch notes, universal functions for numpy arrays can be overwritten by a subclass, causing differences in comparison. This comes into play by raising an error if you try to compare a unyt_array object to a null string, which works with numpy 1.24.2 and below as far as we've tested.

What I Did

import numpy as np
print(np.__version__)
import unyt as u
x = 1.0 * u.kJ
print(assert x != "")

Outputs

1.25.0
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

  self = unyt_quantity(1.0, 'kJ'), ufunc = <ufunc 'not_equal'>
  method = '__call__', inputs = (unyt_quantity(1.0, 'kJ'), ''), kwargs = {}
  func = <method-wrapper '__call__' of numpy.ufunc object at 0x7f43cbce0b40>
  out = None, out_func = None, new_dtype = dtype('float32')

                      new_dtype = np.dtype("f" + str(inp1.dtype.itemsize))
                      conv = new_dtype.type(conv)
                      if offset is not None:
                          raise InvalidUnitOperation(
                              "Quantities with units of Fahrenheit or Celsius "
                              "cannot by multiplied, divided, subtracted or "
                              "added with data that has different units."
                          )
  >                   inp1 = np.asarray(inp1, dtype=new_dtype) * conv
  E                   ValueError: could not convert string to float: ''

Working code

import numpy as np
print(np.__version__)
import unyt as u
x = 1.0 * u.kJ
print(assert x != "")

Outputs

1.24.2
True
neutrinoceros commented 1 year ago

I don't understand the purpose of comparing a unyt_array to a string. What context does this pop into ?

I'm also not sure this is something we can or should fix, because it was never documented as reliable or tested.

CalCraven commented 1 year ago

Yes that's a fair point. We have a package that has used it previously to compare many attributes of an object, some of which are defaulted to a null string, and only write out ones that have been modified. It's easy enough for us to change this comparison to something a little more rigorous, just thought it should be brought to everyone attention in case anyone else has the same bug coming up.

ngoldbaum commented 1 year ago

I think it would probably be ok to just universally return False for equality comparisons with types that aren’t instances of unyt_array. Seems sensible to me.

neutrinoceros commented 1 year ago

I think it would probably be ok to just universally return False for equality comparisons with types that aren’t instances of unyt_array.

this would break any existing comparison between an unyt_array (a) and a np.ndarray (b), which currently works the same as comparing a.ndview with b.

ngoldbaum commented 1 year ago

Of course we’d need to treat ndarrays differently, I meant any random python type like string or other unrelated object.

neutrinoceros commented 1 year ago

Makes sense to me. How about comparing scalar arrays (unyt_quantity) with stuff like int or float ?

ngoldbaum commented 1 year ago

Probably just needs a call to np.asanyarray in __eq__ to handle all array-likes, then anything that ends up with a string or object dtype gets rejected.

neutrinoceros commented 1 year ago

strings may need extra care too

>>> import numpy as np
>>> np.asanyarray("hello")
array('hello', dtype='<U5')
ngoldbaum commented 1 year ago

Right, that's why I said,

then anything that ends up with a string or object dtype gets rejected

I guess also datetimes too. U5 is a string dtype.