Open gooalzqshu opened 4 years ago
Hello, it is not uncommon for the genome size to increase with multiple Racon rounds. Maybe the increase here is a bit bigger than usual. Can you paste the number of contigs in your assembly and the average length?
Best regards, Robert
Hello, Thank you for replying to my email so quickly. In fact, I only did racon polish once. I first cut the genome into about 50 copies, and extracted the raw data and the bam file according to each genome segmented (in order to reduce the running memory) The above example is just a part of the genome. In fact, my whole genome has grown from 2,581Mb to 3,528Mb after correction with racon. I think it is strange. The following are the results of whole genome statistics before and after polish. Before polish:
======================================================================
scaffold contig
length(bp) number length(bp) number
max_len 3,031,338 3,031,338
N10 1,027,631 180 1,027,631 180
N20 666,029 496 666,029 496
N30 454,921 966 454,921 966
N40 326,057 1,638 326,057 1,638
N50 230,008 2,588 230,008 2,588
N60 162,325 3,934 162,325 3,934
N70 115,084 5,820 115,084 5,820
N80 75,078 8,603 75,078 8,603
N90 42,445 13,151 42,445 13,151
Total_length 2,581,092,116 2,581,092,116
number>=100bp 24,130 24,130
number>=2,000bp 23,993 23,993
======================================================================
GC_rate 0.396 0.396
======================================================================
Total N bases: 0 ## Min N: 0 ## Max N: 0
======================================================================
afrer polish:
======================================================================
scaffold contig
length(bp) number length(bp) number
max_len 4,162,086 4,162,086
N10 1,450,108 176 1,450,108 176
N20 933,220 483 933,220 483
N30 639,927 940 639,927 940
N40 459,758 1,592 459,758 1,592
N50 325,441 2,512 325,441 2,512
N60 229,513 3,811 229,513 3,811
N70 163,186 5,630 163,186 5,630
N80 107,296 8,299 107,296 8,299
N90 60,684 12,643 60,684 12,643
Total_length 3,528,214,396 3,528,214,396
number>=100bp 23,314 23,314
number>=2,000bp 23,259 23,259
======================================================================
GC_rate 0.353 0.353
======================================================================
Total N bases: 0 ## Min N: 0 ## Max N: 0
======================================================================
Then the following is the partial genomic statistics after segmentation. Before polish:
======================================================================
scaffold contig
length(bp) number length(bp) number
max_len 2,066,707 2,066,707
N10 1,862,708 3 1,862,708 3
N20 1,263,559 7 1,263,559 7
N30 704,235 13 704,235 13
N40 354,171 23 354,171 23
N50 229,517 42 229,517 42
N60 165,650 68 165,650 68
N70 99,456 108 99,456 108
N80 60,617 173 60,617 173
N90 27,013 308 27,013 308
Total_length 51,392,084 51,392,084
number>=100bp 589 589
number>=2,000bp 588 588
======================================================================
GC_rate 0.396 0.396
======================================================================
Total N bases: 0 ## Min N: 0 ## Max N: 0
======================================================================
afrer polish:
======================================================================
scaffold contig
length(bp) number length(bp) number
max_len 2,870,983 2,870,983
N10 2,579,573 3 2,579,573 3
N20 1,753,175 7 1,753,175 7
N30 1,007,202 12 1,007,202 12
N40 537,959 22 537,959 22
N50 331,771 39 331,771 39
N60 242,000 64 242,000 64
N70 146,608 101 146,608 101
N80 92,793 161 92,793 161
N90 38,692 281 38,692 281
Total_length 69,930,989 69,930,989
number>=100bp 568 568
number>=2,000bp 568 568
======================================================================
GC_rate 0.354 0.354
======================================================================
Total N bases: 0 ## Min N: 0 ## Max N: 0
======================================================================
Which tool did you use to create the sam file? What is the coverage of you read set?
I use minimap2 to map, my reads coverage on the genome is working by bedtools genomecov -d -split, It may take some time,Here are the commands I am mapping
minimap2 -t 15 -ax map-pb --secondary=no sample.contigs.fasta used_3row.part-056.fa | samtools view -@ 15 -bS -t sample.contigs.fasta.fai - -o minimap_56.bam
Thank you very much !
Well, I am not sure what to tell you. You could try passing overlaps in PAF format instead of SAM by discarding the -a
parameter in minimap2, and see if the same happens. On the other hand, what is the expected genome size?
Ok, thank you. I will try to output the paf file instead of the sam file. However, before and after the polish, I definitely hope that the genome size does not change significantly.In addition, I calculated that the coverage of the raw reads on the genome is 98.9%. Thank you for your advice.
Best regards, Zqshu
Hello, I have a question. When I used racon to polish, the genome increased from 50Mb to 67Mb. Is this normal? The following is my run command: racon -t 10 45_select.fa 45_select.sam assembly.part-45.fa > sample.racon_45.fa The output is like this (it looks like there is no problem):