jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
4 stars 3 forks source link

Add reference sequence #22

Closed jeromekelleher closed 1 week ago

jeromekelleher commented 1 year ago

Using fasta from e.g. here: https://github.com/nextstrain/nextclade_data/blob/release/data/datasets/sars-cov-2/references/MN908947/versions/2022-10-27T12:00:00Z/files/reference.fasta

jeromekelleher commented 1 year ago

Don't add the actual sequence though, because this would lead to ts.alignments() doing the wrong thing. Better to pad with Ns for things we've deliberately left out as wonky.

jeromekelleher commented 1 week ago

Have just added in the reference sequence formally. It seems less likely to cause problems to give the full alignments with the reference, than to give Ns for all non-variable sites.