Open armintoepfer opened 8 years ago
@armintoepfer are you familiar with pan-genomic alignment methods? Typically they model the reference as a kind of graph and allow you to align your reads to walks through the graph. They might use a kind of DAG (if a VCF file and reference FASTA file are used to model the population), or an assembly graph such as those produced by some of @lh3's other projects.
These systems attempt to solve exactly the kinds of problems that lead to the use of ambiguous reference bases. However, rather than only allowing SNPs, as ambiguous reference bases do, they allow all manner of variation into the reference model. For instance, a graph can encode ambiguity about indels, but this will be very difficult to do with a linear reference unless we implicitly encode such a graph.
These methods can also handle another problem that occurs in viral resequencing against a linear reference: a graph reference system can directly represent a circular genome.
Would it be possible to handle ambiguous reference bases properly? That would increase the alignment quality for viral use cases.
Thank you.