nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 128 forks source link

augur curate format-dates should recognize masked dates internally #1496

Closed joverlee521 closed 6 days ago

joverlee521 commented 1 week ago

Context

Originally brought up by @emmahodcroft on Slack.

Since augur curate format-dates produces masked dates (e.g. 1997-XX-XX), if you pass outputs through the command again, it will error on the masked dates.

echo '{"date": "1997"}' \
    | augur curate format-dates \
        --date-fields date \
        --expected-date-formats '%Y' \
    | augur curate format-dates \
        --date-fields date \
        --expected-date-formats '%Y-%m-%d'
ERROR: Unable to format date string '1997-XX-XX' in field 'date' of record 0.

It is currently able to work around this issue by explicitly passing the %Y-XX-XX format to the command

echo '{"date": "1997"}' \
    | augur curate format-dates \
        --date-fields date \
        --expected-date-formats '%Y' \
    | augur curate format-dates \
        --date-fields date \
        --expected-date-formats '%Y-%m-%d' '%Y-XX-XX'
{"date": "1980-XX-XX"}

Description

Requiring users to pass the masked formats to get around this seems like an extra load on the user, especially considering they would have to account for all masked formats (i.e. XXXX-XX-XX, %Y-XX-XX, %Y-%m-XX).

The command should just be able to recognize this format internally and pass through the date unchanged if it's already masked.