Closed SCQUchenyang closed 5 years ago
Hi, many thanks for your interest in LAST. Yes, I think "cat" is just fine here. Have a nice day, Martin
Hi, Sir,I am so thankful for your work of LAST. And I have a question when I use it. To save tima,I have align my 10 chromosomes to a reference by parallelizing,and I got 10 maf files. So,what should I do to merge these maf files? Is "cat" useful? Best wishes!
Hi, Did commands run successfully if you just cat these maf files? I am worried about multiple alignments to the same region of reference and the order of alignment block.
So the recipe uses last-split
twice. Doing cat
after the 1st last-split
, and before the 2nd one, should be completely fine.
Hi, Is 1st and 2nd last-split referred to following example extracted from cookbook?
lastdb -P8 -uMAM8 myDB genome1.fa
last-train -P8 --revsym -D1e9 --sample-number=5000 myDB genome2.fa > my.train
lastal -P8 -D1e9 -m100 -p my.train myDB genome2.fa | last-split -fMAF+ > many-to-one.maf
last-split -r many-to-one.maf | last-postmask > out.maf
By the way, I have a big genome with 46,139,523,234 bases and 20131 contigs. Here is the summary of the longest contigs.
ptg000004l 169819904
ptg000441l 158822330
ptg000279l 109104046
ptg000035l 107045360
ptg000669l 100503328
ptg000533l 90735505
ptg000066l 87495606
ptg000800l 85918877
ptg000855l 82319672
ptg000667l 80863498
Is Last suitable for this very big genome by split chromosomes and cat?
Yes, that's the 1st and 2nd last-split.
Wow, big genome! Is that "genome1" or "genome2"? How big is the other genome?
This big genome is query. Target is a normal size genome.
I see.
I'm pretty sure LAST can be suitable, but you might want to tune the parameters for higher speed and not-quite-so-high sensitivity. So I would omit -m100
and -uMAM8
. For higher speed, maybe replace -uMAM8
with -uRY4
or -uRY8
. (The fastest one is -uRY32
.) I would probably add -C2
to the lastal
options.
For step 3 (lastal ... | last-split
), it's fine to run the query chromosomes separately and then cat
the results of step 3. Whether or not you do that makes no difference to the results.
(By the way, --sample-number=5000
was used for highly-diverged genomes, e.g. mammal versus reptile. It may not be necessary if your genomes are less diverged. last-train
by default uses a random sample of 500 2kb fragments from genome2.fa
. I was worried that might not be enough for genomes with only a small fraction of alignable regions.)
Hi, I have run LAST successfully😀 and will tune parameters according to your advices. Now I am curious whether to do chain and net steps like UCSC methods. Do you have some tips?
Message ID: @.***>
I believe chain-and-net is an alternative to last-split, and last-split is better (but I am biased). Here is a comparison: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0670-9
🥳🥳🥳Thanks, LAST is indeed very fast.
Hi, Sir,I am so thankful for your work of LAST. And I have a question when I use it. To save tima,I have align my 10 chromosomes to a reference by parallelizing,and I got 10 maf files. So,what should I do to merge these maf files? Is "cat" useful? Best wishes!