Closed Aannaw closed 1 year ago
No need to divide genome B: just give the whole of B to last-train. Internally, last-train gets a random sample of fragments from genome B, and just uses those fragments. So it shouldn't be too slow.
Thanks for your reply. what does it mean that "the above command was the slowest step (3 CPU-weeks). You can "easily" parallelize it, by processing each sequence within panTro5.fa separately (in parallel). But each process uses quite a lot of memory, so take care that multiple parallel runs don't exceed your memory" in the README.Md. Maybe I have an error understading?
That refers to lastal
, not last-train
. It's a bit out-of-date, please see here for the latest genome-genome alignment recommendations: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-cookbook.rst
Thank you very much.
Aftering checking the https://gitlab.com/mcfrith/last/-/blob/main/doc/last-cookbook.rst, I have runned the last-train
command : last-train -P20 --revsym -E0.05 -C2 A B.fa > B.train
. Then I runned the lastal
command : lastal -E0.05 -C2 -p B.train A B.fa | last-split -fMAF+ >B.maf
. Is this the way to cost the least time?
If that's not fast enough, you could try multi-threading lastal
with -P
and maybe -i
: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-parallel.rst
Then there are ways to make it faster but less sensitive:
https://gitlab.com/mcfrith/last/-/blob/main/doc/last-tuning.rst
The simplest might be lastal option (say) -k32
.
Hello,Professor, I want to align one genome (A) to another genome (B). I have index the A genome and devided B genome into each sequence. And then I in parallel run the determining substitution and gap frequencies step : last-train -P8 --revsym -E0.05 -C2 A $i.B.fa > $i.B.fa.train. I have 15764 sequece and Parallization every 20 run cost too much time. Can you give me any suggestion that can speed the step. Looking forward to your reply.