asymmetry in split - Githubissues

mcfrith / last-genome-alignments

47 stars 5 forks source link

asymmetry in split #19

Closed gaboentropy closed 7 months ago

gaboentropy commented 7 months ago

When I use last-split -r, I expected that comparing genome1 to genome2 would give the same result as genome2 to genome1, but that's not the case. Did I assume incorrectly?

Also, If I'm understanding this correctly, the split function "cleans" the results to get one-to-one genome alignments, meaning no aligned segment is reused in another alignment. Is this right? Is this correct even if the sequence is broken into several pieces (for example, several chromosomes), or each chromosome gets realigned to a different chromosome?

mcfrith commented 7 months ago

Good questions. I'm afraid you did assume incorrectly. I agree it would be nice if comparing genome1 to genome2 gave the same result as 2 to 1, but it doesn't. And I think it would be difficult to "fix" that.

"One-to-one" means here that each base-pair in one genome is aligned to at most one base-pair in the other genome, and vice-versa. Doesn't matter if sequences are broken into pieces etc.

mcfrith commented 7 months ago

The "one-to-one" guarantee holds for everything that is given to last-split -r. For example, if you run last-split -r on each "query" chromosome separately, then it can't "see" all the chromosomes at once.

gaboentropy commented 7 months ago

Thanks for the quick answer. So, would using a -K 0 result in something similar to running last-split? (Sorry, but I trying to get quick full genome alignments, but preserve as much sensitivity as possible). (It seems like -C would also work, but I'm not sure. I guess I need to experiment.)

mcfrith commented 7 months ago

For genome-genome alignments, I recommend not using -K 0: I think it's better to use split (--split-f=MAF+ in the recipes). They are a bit similar, but -K0 is much cruder: it just discards any alignment that overlaps a higher-scoring alignment (on the same DNA strand).

If your genomes are closely related (e.g. human/chimp), you can make it much faster, with probably ok sensitivity, by using a lastdb -uRY option (for example -uRY32).