Open jp-jong opened 4 months ago
Hi @jp-jong
The configure is ok. Here demo/configs config templates for genomes with different sizes. What are the total size and N50 of raw reads and the corrected reads ? PECAT may filter or truncate too low-quality reads, which cause fragments.
Is there any statistics on the assembly of PECAT? These can help me adjust parameters.
In the parameters of NECAT, I think using CNS_READ_LIST
instead of ONT_READ_LIST
would be better as it would skip the error correction step.
Hi @lemene. I haven't used PECAT's assembling options yet but will try soon after I understand PECAT's correction step.
Here are the following statistics (using seqkit stats) for my raw reads and corrected reads: Raw Reads: Total size - 132,927,270 bp ; N50 - 12,542 Corrected Reads using PECAT: Total size - 121,927,673 bp ; N50 - 12,919 Corrected Reads using Canu Total size - 123,492,474 bp; N50 - 13,275
I also changed ONT_READ_LIST to CNS_READ_LIST when assembling using NECAT. Here are my statistics: Assembled PECAT-corrected Reads: Number of contigs: 5 Total size - 5,731,402 bp ; N50 - 5,511,194 Average length: 1,146,280.40 Assembled CANU-corrected Reads: Number of contigs: 4 Total size - 5,681,861 bp ; N50 - 5,471,264 Average length: 1,420,465.30
Here, disabling the correction in NECAT did make a difference with the assembly statistics (especially for PECAT-corrected reads) Nevertheless, here, I see that PECAT seems to correct more reads and resulted to a little lower total size of reads after correction. In terms of its assembly, it has a longer N50 but a bit lower average length of contigs compared to the assembled CANU-corrected reads.
@lemene What do you think? I'm quite satisfied with this comparison since there's not much of a difference but I'd appreciate any inputs from you on how to improve our correction configuration more.
Hi!
I'm just wondering if there is a manual, documentation, or a rule of thumb that can help us in setting the configurations when using PECAT. Recently, we used PECAT to correct ONT reads of a bacteria with an estimated 5.5mb genome size. I don't know whether my configuration is correct but I attached my correction configuration settings here.
` project= smarcescens reads= smarcescens_simplex.filtered.fastq genome_size= 5500000 threads=4 cleanup=1 grid=local
prep_min_length=3000 prep_output_coverage=80
corr_iterate_number=1 corr_block_size=4000000000 corr_filter_options=--filter0=l=5000:al=2500:alr=0.5:aal=5000:oh=1000:ohr=0.1 corr_correct_options=--score=weight:lc=10 --aligner edlib --filter1 oh=1000:ohr=0.01 corr_rd2rd_options=-x ava-ont corr_output_coverage=80 `
And I ended up from 16k reads to 12k reads (with N50 from 12.5kb to 12.9kb).
When I assembled (w/o polishing) the PECAT-corrected reads using a different assembler like NECAT (just to avoid assembler bias), I ended up with the following statistics: Contigs: 25 Assembly size: 5.6mb minimum length: 18kb max length: 1.3mb N50: 550kb
This statistics seem a bit far from a Canu-corrected reads as follows: Contigs: 4 Assembly size: 5.7mb minimum length: 17kb max length: 5.5mb N50: 5.5mb
So here, I noticed that when I assemble the PECAT-corrected reads, the assembly is highly fragmented as compared to Canu-corrected reads. Although I am quite aware that the statistics above doesn't entirely reflect the quality of the assembly; still, I feel like the PECAT-corrected reads weren't as "contiguous" as the Canu-corrected reads. That's why I'm wondering maybe I'm not setting the configuration file correctly.
Here's my Canu command:
user/tools/canu-2.2/bin/canu -correct \ -p smarcescens_canu_corrected \ -d canu_correction_output \ genomeSize=5.5m \ correctedErrorRate=0.15 \ useGrid=false \ minReadLength=1000 \ corThreads=4 \ -nanopore-raw smarcescens_simplex.filtered.fastq 2>&1
And here's my NECAT command to assemble both reads from CANU and PECAT:
PROJECT=necat_assembly ONT_READ_LIST= GENOME_SIZE=5500000 THREADS=4 MIN_READ_LENGTH=3000 PREP_OUTPUT_COVERAGE=40 OVLP_FAST_OPTIONS=-n 500 -z 20 -b 2000 -e 0.5 -j 0 -u 1 -a 1000 OVLP_SENSITIVE_OPTIONS=-n 500 -z 10 -e 0.5 -j 0 -u 1 -a 1000 CNS_FAST_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0 CNS_SENSITIVE_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0 TRIM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 1 -a 400 ASM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 0 -a 400 NUM_ITER=2 CNS_OUTPUT_COVERAGE=30 CLEANUP=1
I'd really appreciate it if you can give us ideas on how to set the parameters in PECAT.
Thanks!