mcfrith / last-genome-alignments

47 stars 5 forks source link

Repeat-(soft) masked genomes as input? #15

Closed ohdongha closed 1 year ago

ohdongha commented 1 year ago

Dear @mcfrith, I wonder whether last genome alignment pipeline accepts or requires repeat-soft-masked genomes as input (as many other genome-aligners) to reduce the time aligning repetitive regions. It appears that there is a step (last-postmask) to filter out alignments from repeats, but I am curious whether using repeat-soft-masked genomes would increase the speed at the alignment step (by lastal?).

Thanks! Dong-Ha

mcfrith commented 1 year ago

By default, last ignores any soft-masking in the input. (It converts the input to all uppercase, then does its own lowercasing of simple repeats.)

It's possible to use soft-masked input to increase alignment speed, by adding options -R11 -c to lastdb.

There are several ways to increase speed, and I'm not sure which are best. One is to use one of the -uRY options for lastdb. And it's probably best to use the -C2 option of lastal.