mcfrith / last-genome-alignments

47 stars 5 forks source link

one to one alignment with interspecific reads #6

Closed LipengKang closed 4 years ago

LipengKang commented 4 years ago

Hi, @mcfrith During two whole genome alignment, last-split conduct two times for one to one alignment. But can I using reads of one specie to align another specie (with )for orthologs not paralogs. I mean whether last-split 2 times will contain more paralogs when using reads not chromosome scale assembly(following pipeline of last genome alignment). In my simulation, A chromosome-scale genome VS B Ref genome covers 40% exon base ,but simulated 5x reads of A VS B Ref genome covers 60% exon base. Any advice?

mcfrith commented 4 years ago

Hi

for aligning reads to a genome (even of a different species), I'm pretty sure you don't want one-to-one alignment, especially if the reads have > 1-fold coverage. So I think last-split should be run just once, not twice.

Have a nice day, Martin

mcfrith commented 4 years ago

Maybe I was too hasty...

Perhaps one-to-one reads-versus-genome alignment could be useful. I've never tried it (or thought of it). It will lose 4x out of your 5x reads, but perhaps that's OK for you. It should indeed reduce paralog alignments.

"whether last-split 2 times will contain more paralogs when using reads not chromosome": I wouldn't expect massively more paralogs (but this is new for me). Orthology is not really one-to-one, so the extra coverage that you see might be non-one-to-one orthologs.

LipengKang commented 4 years ago

Yes, it's also new for me.
I'll test this approach. if it works, we can use last-split 2 times for ortholog calling without assembly. I mean a very low redundancy long reads is enough for otholog calling.

LipengKang commented 4 years ago

Dear @mcfrith Sorry to bother you again. I cut large query genome to 20kb "short contig". Alignment was made following this github guidance. way1: genome-versus-genome alignment way2: contigs-versus-genome alignment However, only ~65% matched bases in gene plus flanking 10kb of way 2 is overlapped with way1. Changing last-split parameter -m1 to -m0.001 also can't optimize the strange result a lot. Split alignment is not suitable for a draft query genome? I don't think so, but what destroy the reproducibility of alignments?

Have a nice day, lipeng

mcfrith commented 4 years ago

Sorry for this slow response. Short answer: I don't know! How exactly did you cut it: overlapping contigs? Precisely adjacent contigs?

One idea is to add -j4 to the lastal options: this does not change the alignments, but it annotates the unambiguity of each alignment column (and makes lastal slower). Maybe the unambiguous columns agree between your two ways, and the ambiguous ones don't?

Have a nice day, Martin