Closed terhorst closed 1 year ago
Thinking about it more, the suggested fix would still convert a deme named "1.0" to a float, which may not be good.
Yeah, your could would give type hinting errors (via mypy, for example) for your case. There, neither source nor dest is an instance of str
, which is the type hint of the function returning the error. @grahamgower is methodical about his type hints, so I'll assume excluding numpy string was intentional.
Actually np.str_
subclasses str
, at least insofar as isinstance(np.str_("x"), str) == True
. So I'm not sure type hinting would catch it.
Interesting, I made a np.string_
and it didn't pass.
>>> import numpy as np
>>> x = np.string_("x")
>>> y = np.str_("y")
>>> isinstance(x, str)
False
>>> isinstance(y, str)
True
>>>
Fortunately, 1.0 and "1.0" will both fail validation as deme names. The former comes in as a float and the latter fails to pass the regex check on names.
Thanks for the report. Doing the right thing with number polymorphism here is a best-effort kind of thing in demes
- so any change that doesn't make things worse is ok with me. The docs for __float__()
don't provide any guidance. np.string_
seems to exist only for historical compatibility, so we should focus on np.str_
. That said, I think something like your suggested fix should work with both str_
and string_
. I'd defintely accept a pr like this, accompanied by an appropriate test case.
diff --git a/demes/demes.py b/demes/demes.py
index 8032891..7d67d9f 100644
--- a/demes/demes.py
+++ b/demes/demes.py
@@ -2347,8 +2347,15 @@ class Graph:
# of the numeric tower, but provide a __float__() method.
if isinstance(value, numbers.Integral):
value = int(value)
- elif isinstance(value, numbers.Real) or hasattr(value, "__float__"):
+ elif isinstance(value, numbers.Real):
value = float(value)
+ elif hasattr(value, "__float__"):
+ try:
+ value = float(value)
+ except ValueError:
+ # numpy.str_ advertises a __float__() method, but conversion
+ # will fail if the string isn't recognisable as a number.
+ return value
return value
data = attr.asdict(self, filter=filt, value_serializer=coerce_numbers)
Hi @terhorst, were you still interested in submitting a PR for this? If not, let me know and I can take care of it.
Hi Graham, sorry, I did not get a chance to wrap this up. Do you want to go ahead and commit it since you've already got the patch written?
MRE using 0.2.2:
What's happening:
numpy.str_
defines a__float__
method, but float conversion throws for non-numeric strings. This causes https://github.com/popsim-consortium/demes-python/blob/1c8a9d9c5f82aa24276f8858ba39a9a9a6be6746/demes/demes.py#L2350-L2351 to fail.Suggested fix: probably just wrap
coerce_numbers()
in atry ... except
and return a string if the conversion fails. I have a PR for this if you're interested.How this arose: During randomized testing. We randomly assign source and dest populations using something like
source, dest = np.random.choice(["A", "B", "C", "D"], 2, replace=False)
This returns a Numpy string even if the input array is pure Python.