popsim-consortium / demes-python

Tools for describing and manipulating demographic models.
https://popsim-consortium.github.io/demes-docs/
ISC License
18 stars 6 forks source link

Crash on dict conversion if using numpy.str_ #495

Closed terhorst closed 1 year ago

terhorst commented 1 year ago

MRE using 0.2.2:

import numpy as np
from demes import Builder
b = Builder(defaults=dict(epoch=dict(start_size=1)))
b.add_deme(np.str_("a"))
b.resolve().asdict()
# ValueError

What's happening: numpy.str_ defines a __float__ method, but float conversion throws for non-numeric strings. This causes https://github.com/popsim-consortium/demes-python/blob/1c8a9d9c5f82aa24276f8858ba39a9a9a6be6746/demes/demes.py#L2350-L2351 to fail.

Suggested fix: probably just wrap coerce_numbers() in a try ... except and return a string if the conversion fails. I have a PR for this if you're interested.

How this arose: During randomized testing. We randomly assign source and dest populations using something like

source, dest = np.random.choice(["A", "B", "C", "D"], 2, replace=False)

This returns a Numpy string even if the input array is pure Python.

terhorst commented 1 year ago

Thinking about it more, the suggested fix would still convert a deme named "1.0" to a float, which may not be good.

molpopgen commented 1 year ago

Yeah, your could would give type hinting errors (via mypy, for example) for your case. There, neither source nor dest is an instance of str, which is the type hint of the function returning the error. @grahamgower is methodical about his type hints, so I'll assume excluding numpy string was intentional.

terhorst commented 1 year ago

Actually np.str_ subclasses str, at least insofar as isinstance(np.str_("x"), str) == True. So I'm not sure type hinting would catch it.

molpopgen commented 1 year ago

Interesting, I made a np.string_ and it didn't pass.

>>> import numpy as np
>>> x = np.string_("x")
>>> y = np.str_("y")
>>> isinstance(x, str)
False
>>> isinstance(y, str)
True
>>> 
molpopgen commented 1 year ago

Fortunately, 1.0 and "1.0" will both fail validation as deme names. The former comes in as a float and the latter fails to pass the regex check on names.

grahamgower commented 1 year ago

Thanks for the report. Doing the right thing with number polymorphism here is a best-effort kind of thing in demes - so any change that doesn't make things worse is ok with me. The docs for __float__() don't provide any guidance. np.string_ seems to exist only for historical compatibility, so we should focus on np.str_. That said, I think something like your suggested fix should work with both str_ and string_. I'd defintely accept a pr like this, accompanied by an appropriate test case.

diff --git a/demes/demes.py b/demes/demes.py
index 8032891..7d67d9f 100644
--- a/demes/demes.py
+++ b/demes/demes.py
@@ -2347,8 +2347,15 @@ class Graph:
             # of the numeric tower, but provide a __float__() method.
             if isinstance(value, numbers.Integral):
                 value = int(value)
-            elif isinstance(value, numbers.Real) or hasattr(value, "__float__"):
+            elif isinstance(value, numbers.Real):
                 value = float(value)
+            elif hasattr(value, "__float__"):
+                try:
+                    value = float(value)
+                except ValueError:
+                    # numpy.str_ advertises a __float__() method, but conversion
+                    # will fail if the string isn't recognisable as a number.
+                    return value
             return value

         data = attr.asdict(self, filter=filt, value_serializer=coerce_numbers)
grahamgower commented 1 year ago

Hi @terhorst, were you still interested in submitting a PR for this? If not, let me know and I can take care of it.

terhorst commented 1 year ago

Hi Graham, sorry, I did not get a chance to wrap this up. Do you want to go ahead and commit it since you've already got the patch written?