Open anna-parker opened 6 days ago
Hi @anna-parker, sorry if the documentation is not clear here!
You will need to provide the --expected-date-formats
option for custom date formats that do not match the current defaults ['%Y-%m-%d', '%Y-%m-XX', '%Y-XX-XX', 'XXXX-XX-XX']
.
If you update the command to include --expected-date-formats '%Y' '%Y-%m'
, it should parse the dates as expected.
Using your example data, I was able to run the following without any warnings:
augur curate format-dates \
--metadata example.tsv \
--date-fields "date" \
--expected-date-formats '%Y' '%Y-%m' \
--output-metadata example_output.tsv \
--failure-reporting warn
Please let us know if this doesn't work for you and we can investigate further.
Thanks @joverlee521 for explaining where the issue lies. I hope you don't mind if I reopen this issue as while not a bug, it's a documentation issue that hasn't yet been resolved afaict:
The first line of augur curate docs makes users believe that 2023 -> 2023-XX-XX works by default:
https://docs.nextstrain.org/projects/augur/en/stable/usage/cli/curate/format-dates.html
Looking further down the docs, it's strange that --expected-date-formats
is listed under the "REQUIRED" header if it has defaults set and hence isn't actually required. It's only required in the sense that it's a no-op if the argument isn't passed.
So given how it's documented, one should either remove the defaults so that it's actually required, or move it to optional, or change the defaults.
There's one more oddity in the docs: Default being stated twice, once as true and once as false
It might be nice to mention a standard usage as well, something like this which was generated by ChatGPT after reading the docs:
augur curate format-dates --metadata metadata.tsv --output curated_metadata.tsv --date-fields date_column --expected-date-formats "%Y" "%Y-%m" "%Y-%m-%d"
The first line of augur curate docs makes users believe that 2023 -> 2023-XX-XX works by default:
Ah, I see that now. Is it clearer if updated to
So given how it's documented, one should either remove the defaults so that it's actually required, or move it to optional, or change the defaults.
I missed this when I added the default values in https://github.com/nextstrain/augur/pull/1501. I'll move --expected-date-formats
down to the optional section.
Default being stated twice, once as true and once as false
The --no-mask-failure
oddity is noted in https://github.com/nextstrain/augur/issues/1585 and is an Augur-wide issue that will be resolved separately.
Thanks @joverlee521! Those edits are great! What I don't yet understand is what format-dates does if run with default values. Does it error if any of the dates are not one of the expected default formats?
What I don't yet understand is what format-dates does if run with default values. Does it error if any of the dates are not one of the expected default formats?
If only run with default values it will just be no-op and will error if the dates do not match the default formats.
Hmm, maybe it would be better to just mark the --expected-date-formats
option as explicitly required even though it has default values.
@joverlee521 thanks for looking into this! I think the screenshot you provided already makes this much clearer
Current Behavior
Hi! I am running
augur curate format-dates
, but augur is unable to format any of the dates.Expected behavior
I would expect augur to format the dates as per documentation:
1987
should become1987-XX-XX
, '1999-05' should become1999-05-XX
.How to reproduce
Steps to reproduce the current behavior:
example_output.tsv
both dates have be formatted asXXXX-XX-XX
.Your environment: if running Nextstrain locally