This PR adds collection dates to the ingest metadata output for six samples.
These samples were force-included in the Nextclade dataset tree to increase the representation of rare genotypes in the tree. However, these samples have empty date fields in the metadata output from NCBI Datasets. This results in the samples being removed by the TreeTime clock filter.
Fortunately, the NCBI metadata includes strain names for these six samples, and the collection dates can be extracted from the strain names.
This PR adds the collection dates (which were extracted manually from the strain names) for the six samples to ingest/defaults/annotations.tsv, which results in collection dates being included in the ingest metadata output, and also results in the samples being included by TreeTime in the Nextclade dataset tree.
Description of proposed changes
This PR adds collection dates to the ingest metadata output for six samples.
These samples were force-included in the Nextclade dataset tree to increase the representation of rare genotypes in the tree. However, these samples have empty date fields in the metadata output from NCBI Datasets. This results in the samples being removed by the TreeTime clock filter.
Fortunately, the NCBI metadata includes strain names for these six samples, and the collection dates can be extracted from the strain names.
This PR adds the collection dates (which were extracted manually from the strain names) for the six samples to
ingest/defaults/annotations.tsv
, which results in collection dates being included in the ingest metadata output, and also results in the samples being included by TreeTime in the Nextclade dataset tree.Related issue(s)
https://github.com/nextstrain/measles/pull/28
Checklist