mcfrith / last-genome-alignments

47 stars 5 forks source link

fragment the reference genome #18

Closed AlisaGU closed 10 months ago

AlisaGU commented 10 months ago

Hi, mcfrith

Splitting query genome into several files will save time. What about fragment the reference and query genome both into about 500M long subsequences? Fragment but still in one file.

I guess fragmented query genome is faster than unfragmented query genome, but fragmented reference genome will cost similar time as unfragmented reference genome.

mcfrith commented 10 months ago

I would typically not do any such fragmentation.

I think the number of files makes no difference (except that separate files means you could use separate computers).

If you break one long query sequence into smaller sequences, that can improve parallelization. That's because each parallel thread deals with one whole query sequence (https://gitlab.com/mcfrith/last/-/blob/main/doc/last-parallel.rst).

I think you're right: it's not useful to fragment the reference.

AlisaGU commented 10 months ago

Thanks!