Open victorlin opened 2 years ago
The culprit code is from my rewrite of get_groups_for_subsampling
in #794:
This function is assuming that everything is in a potentially ambiguous/incomplete YYYY-MM-DD
format.
However, it looks like this was already the case before the PR:
So I'm not sure that this was ever a supported use-case, likely unintentionally.
As for expected behavior, one possibility is that Augur is smart enough to extract year/month/day from numeric dates, or at least year/month which are supported --group-by
values.
This could leverage treetime.utils.datetime_from_numeric
which converts a numeric date into a datetime.datetime
:
import treetime
treetime.utils.datetime_from_numeric(2018.23)
# datetime.datetime(2018, 3, 25, 0, 0)
However, this won't work for non-positive dates:
treetime.utils.datetime_from_numeric(-2018.23)
# ...
# ValueError: year -2018 is out of range
Which is part of a larger discussion of how to support various date formats in Augur: #882
Current Behavior
Having any floating-point numeric date crashes subsampling if grouping by
year
oryear month
.Expected behavior
Not sure, but not a crash.
How to reproduce
Output:
Specifically, this is coming from
augur.filter.get_groups_for_subsampling
.Click to expand Python example
```py import augur import pandas as pd columns = ['strain', 'date', 'country'] data = [ ("SEQ_1","2018.23","A"), ("SEQ_2","2020-03-25","A"), ] metadata = pd.DataFrame.from_records(data, columns=columns).set_index('strain') strains = metadata.index.tolist() groups = ['country', 'year', 'month'] augur.filter.get_groups_for_subsampling(strains, metadata, group_by=groups) ``` Output: ``` ... 125 try: --> 126 return values.astype(dtype, casting="safe", copy=copy) 127 except TypeError as err: TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe' ```Your environment: if running Nextstrain locally
auspice 2.7.0
): augur 13.1.2