Open neptuneyt opened 3 years ago
Hi,
-l represents the minimum length of the seed contig. In the figure, the large blue circle is the seed contig and -l would determine its minimum length. In your highlighted example, the orange one is the seed contig and -l would determine its minimum length.
The description of -ml is a little off. -ml determines the minimum alignment length that will be included in the merging process. Any alignment lesser than -ml will not be merged.
I hope this helps. Let me know if you have any other questions.
On Sun, Jan 24, 2021 at 12:12 AM neptuneyt notifications@github.com wrote:
Dear quickmerge teams, I have installed the latest quickmerge which could support mumer 4,but I was confused by the argument -l and -ml, according the manual,
-l LENGTH_CUTOFF, --length_cutoff LENGTH_CUTOFF,which means minimum seed contig length to be merged (default=0) -ml MERGING_LENGTH_CUTOFF, --merging_length_cutoff MERGING_LENGTH_CUTOFF,which means setting the merging length cutoff necessary for use in quickmerge (default 5000) Does it means the same as described in the picture below? [image: image] https://user-images.githubusercontent.com/39893798/105624681-a00bd980-5e5e-11eb-951b-cf8ba71b3926.png Thanks a lot!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/62, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZQH2B6S2AORBSD27SQDODS3PI4TANCNFSM4WQK7MVQ .
-- Mahul Chakraborty Department of Ecology and Evolutionary Biology University of California-Irvine Phone: 949 824 9559 Fax: 949 824 9559 Website: https://mahulchakraborty.wordpress.com/ Github: https://github.com/mahulchak
Thanks your kindly reply in time, but I still failed to understand the -ml
,I have test the -l
and -ml
, results as below:
from the first table, I test -l
from 10-5000
,but the merged sequence could not improve compare to raw two contig sets;
the second table, I test -ml
from 1-5000
, the merged sequence quality were affected by its' length. so how can I understand such result?
Looking forward your reply, thanks a lot.
Sorry for disturb you, I have done another pure test: I extracted 10k contigs from two assembly, respectively.Then command:
nohup merge_wrapper.py -l 10 -ml 10 -t 50 -v C1_10k.fa C2_10k.fa &>log&
from param_summary_out.txt
,I could count 205 pair of overlaped contig,so it was just account 0.0205 (205/10000) overlaped rate.
REF QUERY REF_START REF_END Q_START Q_END ORIENTATIONINNIE(1/0) OVERLAP_LEN OVERLAP_PROP NO_OVERLAP_AT_ENDS OVERHANG 1 Cluster2_k141_213238 Cluster1_k141_1058634 2107 2631 53324809 R 0 523 0.263343 1986 3787 ... Cluster2_k141_15754305 Cluster1_k141_1064426 57 3426 4010642 L 0 3368 4.82521 698 154 205 Cluster2_k141_11926444 Cluster1_k141_109005 2257 3942 21993893 L 1 1694 0.6776 2500 0
my raw two 10k contigs total size was 113M(113248645 bp), but the merged_out.fasta
total size was 59M(59258534 bp) , it does not make sense given the low overlaped rate(2%).
so I checked one of overlaped pairs, the overlap relationship as below:
REF QUERY REF_START REF_END Q_START Q_END ORIENTATIONINNIE(1/0) OVERLAP_LEN OVERLAP_PROP NO_OVERLAP_AT_ENDS OVERHANG Cluster2_k141_6817205 Cluster1_k141_1166759 1020 4705 1 3684 R 03683 3683 1 257
And I found a sequence named Cluster2_k141_6817205 in the merged_out.fasta
,it seems the merged sequence names the largest one of two overlaped contigs, and it was correctly! So strangely!
And then, I check the merged_out.fasta ID, |
Source | Numbers |
---|---|---|
from Cluster1 | 9941 | |
from Cluster2 | 58 |
merged_out.fasta
9941 Cluster1 source sequence, it seems all merged contig length are same as raw length
Source contig_length Raw.tsv:Cluster1_k141_1025 3698 Merge.tsv:Cluster1_k141_1025 3698 Raw.tsv:Cluster1_k141_1026 3852 Merge.tsv:Cluster1_k141_1026 3852 Raw.tsv:Cluster1_k141_1040 8359 Merge.tsv:Cluster1_k141_1040 8359 Raw.tsv:Cluster1_k141_1057577 8707 Merge.tsv:Cluster1_k141_1057577 8707 Raw.tsv:Cluster1_k141_1057886 3968 Merge.tsv:Cluster1_k141_1057886 3968 Raw.tsv:Cluster1_k141_1057955 3078 Merge.tsv:Cluster1_k141_1057955 3078 Raw.tsv:Cluster1_k141_1058039 3038 Merge.tsv:Cluster1_k141_1058039 3038 Raw.tsv:Cluster1_k141_1058096 4079 Merge.tsv:Cluster1_k141_1058096 4079 Raw.tsv:Cluster1_k141_1058151 3719 Merge.tsv:Cluster1_k141_1058151 3719 Raw.tsv:Cluster1_k141_1058269 3248 Merge.tsv:Cluster1_k141_1058269 3248 Raw.tsv:Cluster1_k141_1058399 7611 Merge.tsv:Cluster1_k141_1058399 7611
merged_out.fasta
58 Cluster2 source sequence, merged contig length are large than raw length
Raw.tsv:Cluster2_k141_10429771 4993 Merge.tsv:Cluster2_k141_10429771 5069 Raw.tsv:Cluster2_k141_10436849 10727 Merge.tsv:Cluster2_k141_10436849 12696 Raw.tsv:Cluster2_k141_10643446 5615 Merge.tsv:Cluster2_k141_10643446 7713 Raw.tsv:Cluster2_k141_1067037 6430 Merge.tsv:Cluster2_k141_1067037 6430 Raw.tsv:Cluster2_k141_1067215 11431 Merge.tsv:Cluster2_k141_1067215 20071 Raw.tsv:Cluster2_k141_1067595 11140 Merge.tsv:Cluster2_k141_1067595 11140 Raw.tsv:Cluster2_k141_10859382 4492 Merge.tsv:Cluster2_k141_10859382 4492 Raw.tsv:Cluster2_k141_11711522 6219 Merge.tsv:Cluster2_k141_11711522 7268 Raw.tsv:Cluster2_k141_11713665 3653 Merge.tsv:Cluster2_k141_11713665 5739 Raw.tsv:Cluster2_k141_12137628 6638 Merge.tsv:Cluster2_k141_12137628 7152 Raw.tsv:Cluster2_k141_1279455 28667 Merge.tsv:Cluster2_k141_1279455 29290
So, how can I explain above result, In my opinion, does quickmerge's final merged genome are output the extend two overlapped contigs pair and plus the non-overlapped contigs in each sets?
Looking forward your reply, Thanks a lot!
Dear quickmerge teams, I have installed the latest quickmerge which could support mumer 4,but I was confused by the argument
-l
and-ml
, according the manual,