Open cement-head opened 3 years ago
Here is the preprint: https://www.biorxiv.org/content/10.1101/2020.08.07.242461v1. Although, the version in the benchmark is 1.1.10, and versions 1.3.0 and upwards use far less memory. We should update the preprint soon. Answers:
Okay, we just did a 6.0 Gbp beastie; but RAVEN gave us just over 7.0 Gbp.
Took five days, 2 TB ECC RAM; 124 threads; two CUDAS (RTX TITANS used for polishing; -c=100)
Given that the assembly is a little large, I'm wondering if I should change any of these three parameters, and whether or not you'd have some recommendations?
-m, --match <int>
default: 3
score for matching bases
-n, --mismatch <int>
default: -5
score for mismatching bases
-g, --gap <int>
default: -4
gap penalty (must be negative)
Also, would increasing the rounds of polishing (RACON) drastically improve the assembly?
Okay - got 0.1% Complete with a BUSCO analysis. Something is wrong, would you suggest increasing the penalty for the mismatch score?
Can you print the assembly statistics (length/#contigs/NX/NGX)? Which sequencing technology are you using? What is the sequencing depth? The BUSCO score is abysmal, not sure if changing alignment parameters will help. Running more than 2 iterations of Racon will not increase the accuracy by much either.
Sorry for my late reply! Best regards, Robert
P.S. You can also paste here the log Raven created.
Technology is PacBioSII CLR with the N50 of the raw reads >36Kbp.
The coverage is about 70x.
Q: Would adjusting the -m, -n, -g parameters improve assembly?
What file is the RAVEN logfile?
Here's the QUAST analysis; the # of contigs is good-ish, but the N50 isn't the greatest:
Assembly raven_asm
# contigs (>= 0 bp) 25505
# contigs (>= 1000 bp) 25505
# contigs (>= 5000 bp) 25505
# contigs (>= 10000 bp) 25504
# contigs (>= 25000 bp) 25504
# contigs (>= 50000 bp) 25473
Total length (>= 0 bp) 7048262437
Total length (>= 1000 bp) 7048262437
Total length (>= 5000 bp) 7048262437
Total length (>= 10000 bp) 7048257309
Total length (>= 25000 bp) 7048257309
Total length (>= 50000 bp) 7046876721
# contigs 25505
Largest contig 3296975
Total length 7048262437
GC (%) 43.05
N50 337254
N75 208232
L50 6350
L75 13031
# N's per 100 kbp 0.00
The log is outputed to stderr. I am not sure if changing alignment parameters will help at all. The assembly is quite fragmented which might be the reason for bad BUSCO performance.