nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 128 forks source link

Augur filter does not support the integer.float input date format for --min/max-date filtering #748

Open corneliusroemer opened 3 years ago

corneliusroemer commented 3 years ago

Current Behavior
Despite the documentation mentioning augur style integer dates (e.g. 2019.2413) as being accepted as a parameter of --min/max-date, this format is not supported as input data format to augur filter. When trying to filter a dataset that uses index dates as dates, an error such as the following is thrown:

WARNING: MT343860 has an invalid data string: 2020.1338797814208

Expected behavior
I would expect integer dates to be accepted as input format, because integer dates are accepted as parameters.

Related to this discussion: https://github.com/nextstrain/augur/issues/662

huddlej commented 3 years ago

@corneliusroemer, do you have an example dataset that uses these floating point dates in the metadata?

@rneher pointed out that the use case for these dates is for analyses of ancient DNA/RNA sequencing where dates like -11220-04-12 will break the standard Python datetime library.

I'm curious if we didn't catch this issue before because no one has tried to use Augur for these types of analyses or they did and gave up when it immediately failed. If this type of input is actually more of an edge case (relative to the primary use cases), we might consider this as a feature request instead of a bug.