nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 129 forks source link

[ancestral] VCF and JSON outputs differ #1380

Closed jameshadfield closed 7 months ago

jameshadfield commented 8 months ago

Current Behavior

Given a site with a reference allele and a N (encoded in a VCF file), then that site is not included in the output JSON or the output VCF. This is expected behaviour unless we are running with --keep-ambiguous.

However if there is another [ATGC] allele at that site as well as the "N" allele, then the "N" allele is included in the output VCF. (The JSON output does not include the "N" allele.)

This causes a subtle but sizeable bug:

Expected behavior

The VCF output shouldn't include the N allele.

jameshadfield commented 8 months ago

Fixed by Augur PR #1355 (commit cf531f6) + TreeTime PR https://github.com/neherlab/treetime/pull/263 (commit https://github.com/neherlab/treetime/commit/924d9ba37e9151ab27dbc5f87d7e50695f6439d4)

jameshadfield commented 7 months ago

Closed by #1355