voutcn / megahit

Ultra-fast and memory-efficient (meta-)genome assembler
http://www.ncbi.nlm.nih.gov/pubmed/25609793
GNU General Public License v3.0
588 stars 134 forks source link

--min-contig-len confused #284

Open feihongloveworld opened 4 years ago

feihongloveworld commented 4 years ago

hi sir: i want to get all length contig,with --min-contig-len 0; but the max_len is much small than default parametes. default reuslt as below

file                       format  type  num_seqxxxs     sum_len  min_len  avg_len  max_len
xxxxxx.contigs.fa  FASTA   DNA      8,148  16,813,516      200  2,063.5  114,231

--min-contig-len 0 result as below

file                       format  type  num_seqs     sum_len  min_len  avg_len  max_len
xxxxxxx.contigs.fa  FASTA   DNA     94,420  23,407,604       60    247.9   58,292
wangpeng407 commented 4 years ago

I have the same problem.

megahit --read non_nt.fastq.gz -o out1 --presets meta-large  -t 8

Result:
        Scaffold    Contig
Total Num   1955    1955
Total Length(bp)    6276121 6276121
N50 Length(bp)  58730   58730
N90 Length(bp)  4371    4371
Max Length(bp)  185356  185356
Min Length(bp)  200 200
Sequence GC(%)  55.68   55.68
megahit --read non_nt.fastq.gz -o out2 --presets meta-large  -t 8 --min-contig-len 0

Results:
        Scaffold    Contig
Total Num   46203   46203
Total Length(bp)    10338600    10338600
N50 Length(bp)  460 460
N90 Length(bp)  88  88
Max Length(bp)  8202    8202
Min Length(bp)  78  78
Sequence GC(%)  51.59   51.59

So the argument min-contig-len 0 leads to a large difference, which confused me for a long time.

Could you pleas help us solve this issue?

feihongloveworld commented 4 years ago

@voutcn i need your help