Closed AlisaGU closed 10 months ago
lastdb
is bigger than 4G, I would use lastdb5
.-RY32
makes it very high-speed and low-sensitivity, -uMAM8
the opposite. Try something in between, e.g. -uRY4
.-m100
.lastal
option -C2
.I hope that helps!
Thanks for your reply, I will try again ヾ(´▽‘)ノ
I used the alignment between xentro and human as an example.
$lastdb -P20 -uRY4 $reference_abbre $reference_genome
${last_train} -P20 --revsym -D1e9 --sample-number=5000 $reference_genome_dir/$reference_abbre $query_whole_genome_sequence >${train_outfile}
$lastal -P20 -D1e9 -C2 -p $train Xtro $genome2 | $last_split -fMAF+ >many-to-one.maf
$last_split -r many-to-one.maf | $last_postmask >out.maf
Data I used :
Xenopus tropicalis: GCF_000004195.4_UCB_Xtro_10.0_genomic.fna
hg38: GCF_000001405.39_GRCh38.p13_genomic.fna
After change lastdb parameter to uRY4, code run fastly but result is still small(27M) compared to you posted (66M). Is decreased size normal?
xentro (frog) and human genomes are extremely diverged from each other, so I suppose a much smaller result is not too surprising. Maybe you can try an intermediate level of speed-versus-sensitivity. For example, if you don't use any -u
option (so no -uRY4
or -uMAM8
), it will be more sensitive than -uRY4
. Or you could try -uMAM4
, which is not quite as slow as -uMAM8
.
ok, let me try
I find another situation: the one-to-one maf of whole query genome(38M) is larger than that of splited genome(split and cat, 27M). Is using the whole query genome the best way for pairwise genome alignment?
For "split and cat", the cat must be done before last-split -r
. Then the results should be identical to using the whole query.
This is my split and cat codes:
# make database for reference
$lastdb -P20 -uRY4 $reference_abbre $reference_genome
# create train file for whole query genome
${last_train} -P20 --revsym -D1e9 --sample-number=5000 $reference_genome_dir/$reference_abbre $query_whole_genome_sequence >${train_outfile}
# split whole query genome into 9 parts and run following code for each one.
$lastal -P20 -D1e9 -C2 -p $train_outfile $reference_db $query_sequence | ${last_split} -fMAF+ >${many_to_one_maf}
# cat 9 many-to-one mafs into a whole many-to-one maf
for i in $(seq 1 9); do
cat part${i}/${reference_abbre}${query_abbre}.${i}.maf >>$global_many_to_one_maf
done
# many-to-one to one-to one
$last_split -r -m1e-5 $global_many_to_one_maf| $last_postmask >$one_to_one_maf
It seems no error.
By the way, I tried no -u
option and uMAM4
. No -u
option achieves the best balance beteen sensitive and time. Compared to uMAM8
, result of no -u
option is 61M and that of uMAM4
is 63M. But the time cost of uMAM4
is almost 4 times.
Your split and cat seems correct, and it should give identical results to using the whole query genome. If it doesn't, that's strange... I can only suggest checking you didn't make any mistake.
By the way, you can do the cat without a loop:
cat part*/${reference_abbre}${query_abbre}.*.maf >$global_many_to_one_maf
(Could the problem be that there was already a $global_many_to_one_maf
file before your cat loop?)
I know what the problem is. It's my mistake. I split genome incompletely and sequences used for alignment just accout for only 75% of the whole genome. After correction, everything goes OK!
cat part*/${reference_abbre}${query_abbre}.*.maf >$global_many_to_one_maf
Thanks for this tips.
All problems are solved. Thanks for your timely help and LAST. 。:.゚ヽ(。◕‿◕。)ノ゚.:。+゚
Enmmm,although the many-to-one mafs' size of whole genome and split-cat are the same, but the final one-to-one maf is still different with whole genome(38M) larger than split-cat (32M). Is it right?
They should be identical...
Hi, I have several genomes whose divergence time is between 110 MYA and 295 MYA. These size are quite differerent from 3G to 27G.
At first, I use
but the result is so so small.
I later use the example (xentro vs hg38) posted in your github.
It's extremely slow(the code has run 3d+), and it's seems these parameter are not suitable for large genome.
I am confused to select these parameters. Could you give some tips?
All the best,