nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 129 forks source link

Expand augur parse #990

Closed joverlee521 closed 1 year ago

joverlee521 commented 2 years ago

Context

In internal discussions on how to implement #860 and through my prototype in monkeypox/ingest, it seems like we've agreed to use the NDJSON format for streaming records between augur curate subcommands. However, users are unlikely to have their data in this format, so we would need to expand augur parse to support parsing to/from NDJSON format.

This idea was originally proposed by @trvrb in Slack.

Description

Add new subcommands to augur parse to handle:

We should probably keep augur parse backwards compatible as it is a widely used command (see search on cs.github.com)

joverlee521 commented 1 year ago

I've implemented this in the augur curate I/O framework in #1039

Keeping augur parse as is to continue to parse metadata from FASTA headers.