nextstrain / mpox

Nextstrain build for mpox virus
https://nextstrain.org/mpox
MIT License
39 stars 16 forks source link

Ingest: remove `reverse` column from metadata TSV #209

Open joverlee521 opened 8 months ago

joverlee521 commented 8 months ago

(Originally flagged the obsolete reverse column in https://github.com/nextstrain/monkeypox/pull/207#discussion_r1349164763)

Reverse complement sequences were initially manually flagged by the reverse column added in https://github.com/nextstrain/monkeypox/pull/79.

Since Nextclade v2.2.0, there's a built-in --retry-reverse-complement option that adds a new column isReverseComplement. This feature was used in the ingest pipeline starting from https://github.com/nextstrain/monkeypox/pull/89. Then in https://github.com/nextstrain/monkeypox/pull/94, the ingest/bin/reverse_reversed_sequences.py script was replaced with the built-in Nextclade functionality as well.

In https://github.com/nextstrain/monkeypox/pull/191, the phylogenetic pipeline switched over from using the reverse column to the is_reverse_complement column output from Nextclade. This seemingly makes the reverse column obsolete. When checking the latest metadata TSV (2023-10-13), the reverse column is completely empty.

From my point of view, we can just remove the reverse column from the metadata.tsv file, but wanted to confirm with other users of the pipeline/metadata.tsv file (cc: @corneliusroemer, @chaoran-chen).

chaoran-chen commented 8 months ago

Hi @joverlee521, thank you very much for pinging me! I don't use the reverse column.