how to speed the last-train step

mcfrith / last-genome-alignments

47 stars 5 forks source link

how to speed the last-train step #12

Closed Aannaw closed 1 year ago

Aannaw commented 2 years ago

Hello,Professor, I want to align one genome (A) to another genome (B). I have index the A genome and devided B genome into each sequence. And then I in parallel run the determining substitution and gap frequencies step : last-train -P8 --revsym -E0.05 -C2 A $i.B.fa > $i.B.fa.train. I have 15764 sequece and Parallization every 20 run cost too much time. Can you give me any suggestion that can speed the step. Looking forward to your reply.

mcfrith commented 2 years ago

No need to divide genome B: just give the whole of B to last-train. Internally, last-train gets a random sample of fragments from genome B, and just uses those fragments. So it shouldn't be too slow.

Aannaw commented 2 years ago

Thanks for your reply. what does it mean that "the above command was the slowest step (3 CPU-weeks). You can "easily" parallelize it, by processing each sequence within panTro5.fa separately (in parallel). But each process uses quite a lot of memory, so take care that multiple parallel runs don't exceed your memory" in the README.Md. Maybe I have an error understading?

mcfrith commented 2 years ago

That refers to lastal, not last-train. It's a bit out-of-date, please see here for the latest genome-genome alignment recommendations: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-cookbook.rst

Aannaw commented 2 years ago

Thank you very much. Aftering checking the https://gitlab.com/mcfrith/last/-/blob/main/doc/last-cookbook.rst, I have runned the last-train command : last-train -P20 --revsym -E0.05 -C2 A B.fa > B.train. Then I runned the lastal command : lastal -E0.05 -C2 -p B.train A B.fa | last-split -fMAF+ >B.maf. Is this the way to cost the least time?

mcfrith commented 2 years ago

If that's not fast enough, you could try multi-threading lastal with -P and maybe -i: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-parallel.rst

Then there are ways to make it faster but less sensitive: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-tuning.rst The simplest might be lastal option (say) -k32.