Closed ChriKub closed 6 years ago
Hi Chris, Yes, this is intended behaviour. With reveal I aim to only align sequence that (locally) matches uniquely given the hierarchical decomposition of the alignment. As a result you should always end up with a directed acyclic graph. See the preprint for more details. For now I don't foresee any efforts to allow self-alignments (or loops) in the graph, maybe a 'de bruijn' graph approach might be more suited for your approach?
Cheers, Jasper
I do understand the principal idea behind that approach. Downside, you loose every element that has a very recent duplication history. Bit of a bummer when you look into a typical Eukaryota genome no?
Hi Jasper, thanks for the quick response. I get your idea, but as @fbemm mentioned you will lose information on duplication events. Dealing with repetitive sequences is tricky but the additional information that is obtainable is worthwhile (at least in my opinion).
Cheers, Chris
Well, I don't think you lose any information. I think it depends on the biological question. If you multi-align 7 genomes, reveal will give you a graph that in the form of bubbles describes how those genomes differ from each other. For instance, your duplication event should pop up as an insertion bubble within the graph. With the sequence that is contained in these bubbles you can perform subsequent alignments to figure out how a set of paralogous genes differ from each other. Maybe we can think of a graph based representation that tries to incorporate all this information, but I don't think that's an easy problem...
Hi, I'm aligning 7 genomes. Each of the genomes contain repetitive elements which I would expect to align with its self in the same genome, so I would end up with segments in the gfa that are traversed more then 7 times. In my alignments the max number of traversals through a segment is 7. Is this a wanted behaviour and is there any way to allow for self-alignments?
Thanks, Chris