lemene / PECAT

PECAT, a phased error correct and assembly tool
BSD 2-Clause "Simplified" License
36 stars 0 forks source link

Parameter settings for ONT reads that are shorter #35

Open melop opened 1 month ago

melop commented 1 month ago

Thank you again for developing this great tool. We have a fish ONT dataset (10.4.1) which we assembled with nextdenovo to produce a genome size of ~560Mb and an N50=~7Mb. I thought I should also try PECAT to get a phased assembly but initially had very poor N50=~100kb. We have about 23Gbp ONT data with an read N50 of only 3.6kb, this may be why the corrected reads had only 7Gb left due to the filtering options. sum = 23002802924, n = 12113629, ave = 1898.92, largest = 839594 N50 = 3623, n = 1229131 N60 = 2535, n = 1994606 N70 = 1811, n = 3073538 N80 = 1292, n = 4583019 N90 = 831, n = 6770892 N100 = 1, n = 12113629

I then relaxed the filtering criteria, but still only got 13Gb of corrected reads. And the program crashed due to float point exception (probably due to genome size estimate). Could you suggest parameter settings that would best work for our case? Thank you very much!

Our current config:

`project= ab reads= allreads.fasta genome_size= 560000000 threads=24 cleanup=1 grid=local prep_min_length=1000 prep_output_coverage=100 corr_iterate_number=1 corr_block_size=200000000 corr_filter_options=--filter0=l=1000:al=900:alr=0.5:aal=3000:oh=3000:ohr=0.3 corr_correct_options=--score=weight:lc=10 --aligner edlib --filter1 oh=1000:ohr=0.01 --candidate n=600:f=30 corr_rd2rd_options=-x ava-ont -f 0.005 -I 10G corr_output_coverage=80

align_block_size=1000000000 align_rd2rd_options=-X -g3000 -w30 -k19 -m100 -r500 -I 10G -f 0.002 align_filter_options=--filter0=l=1000:aal=900:aal=3000:aalr=0.5:oh=3000:ohr=0.3 --task=extend --filter1=oh=300:ohr=0.03 asm1_assemble_options=--max_trivial_length 10000

phase_method=2 phase_rd2ctg_options=-x map-ont -c -p 0.5 -r 1000 phase_use_reads=1 phase_phase_options= --coverage lc=30 --phase_options icr=0.1:icc=8:sc=10 phase_filter_options = --threshold 1000

phase_clair3_use_reads=0 phase_clair3_command = singularity exec -B /fast3:/fast3 /public/software/clair3/clair3_v0.1-r12.sif /opt/bin/run_clair3.sh phase_clair3_options=--platform=ont --model_path=/opt/models/ont_guppy5/ --include_all_ctgs phase_clair3_rd2ctg_options=-x map-ont -c -p 0.5 -r 1000 phase_clair3_phase_options= --coverage lc=30 --phase_options icr=0.1:icc=3:sc=10 --filter i=70 phase_clair3_filter_options = --threshold 2500 --rate 0.05

asm2_assemble_options=--reducer0 "best:cmp=2,0.1,0.1|phase:sc=3" --contig_format dual,prialt --min_identity 0.98

polish_map_options = -x map-ont -w10 -k19 -I 10g polish_use_reads=0 polish_filter_options=--filter0 oh=2000:ohr=0.2:aalr=0.5 polish_cns_options =

polish_medaka = 1 polish_medaka_command = singularity exec -B /fast3:/fast3 /public/software/medaka/medaka_v1.7.2.sif medaka polish_medaka_map_options = -x map-ont -w10 -k19 polish_medaka_cns_options = --model r1041_e82_400bps_sup_g615 `

lemene commented 1 month ago

Hi @melop The N50 of the dataset is short. Some other parameters need to be added to the filter0 or filter option.

corr_correct_options=--filter0=l=1000:al=900:alr=0.5:aal=3000:oh=3000:ohr=0.3
asm1_assemble_options=--filter0=l=1000:al=900
asm2_assemble_options=--filter0=l=1000:al=900
phase_phase_options=--filter=l=1000:al=900
phase_clair3_phase_options=--filter=l=1000:al=900
melop commented 1 month ago

Hello, I tried adding the above parameters and reran pecat. Unfortunately the corrected reads was still only 13Gb. The N50 improved to 700kb in the primary assembly, but it is still much lower than 7.4Mb from nextdenovo. Here is our config file after adding those parameters. I'm wondering whether the "oh" and "ohr" also need to be changed to different values to allow shorter reads to pass through? Thank you! `project= ab reads= allreads.fasta genome_size= 560000000 threads=24 cleanup=1 grid=local prep_min_length=1000 prep_output_coverage=100 corr_iterate_number=1 corr_block_size=200000000 corr_filter_options=--filter0=l=1000:al=900:alr=0.5:aal=3000:oh=3000:ohr=0.3 corr_correct_options=--filter0=l=1000:al=900:alr=0.5:aal=3000:oh=3000:ohr=0.3 --score=weight:lc=10 --aligner edlib --filter1 oh=1000:ohr=0.01 --candidate n=600:f=30 corr_rd2rd_options=-x ava-ont -f 0.005 -I 10G corr_output_coverage=80

align_block_size=1000000000 align_rd2rd_options=-X -g3000 -w30 -k19 -m100 -r500 -I 10G -f 0.002 align_filter_options=--filter0=l=1000:aal=900:aal=3000:aalr=0.5:oh=3000:ohr=0.3 --task=extend --filter1=oh=300:ohr=0.03 asm1_assemble_options=--filter0=l=1000:al=900 --max_trivial_length 10000

phase_method=2 phase_rd2ctg_options=-x map-ont -c -p 0.5 -r 1000 phase_use_reads=1 phase_phase_options=--filter=l=1000:al=900 --coverage lc=30 --phase_options icr=0.1:icc=8:sc=10 phase_filter_options = --threshold 1000

phase_clair3_use_reads=0 phase_clair3_command = singularity exec -B /fast3:/fast3 /public/software/clair3/clair3_v0.1-r12.sif /opt/bin/run_clair3.sh phase_clair3_options=--platform=ont --model_path=/opt/models/ont_guppy5/ --include_all_ctgs phase_clair3_rd2ctg_options=-x map-ont -c -p 0.5 -r 1000 phase_clair3_phase_options=--filter=l=1000:al=900 --coverage lc=30 --phase_options icr=0.1:icc=3:sc=10 --filter i=70 phase_clair3_filter_options = --threshold 2500 --rate 0.05

asm2_assemble_options=--filter0=l=1000:al=900 --reducer0 "best:cmp=2,0.1,0.1|phase:sc=3" --contig_format dual,prialt --min_identity 0.98

polish_map_options = -x map-ont -w10 -k19 -I 10g polish_use_reads=0 polish_filter_options=--filter0 oh=2000:ohr=0.2:aalr=0.5 polish_cns_options =

polish_medaka = 1 polish_medaka_command = singularity exec -B /fast3:/fast3 /public/software/medaka/medaka_v1.7.2.sif medaka polish_medaka_map_options = -x map-ont -w10 -k19 polish_medaka_cns_options = --model r1041_e82_400bps_sup_g615

`

lemene commented 1 month ago

oh and ohr are thresholds used to check for overhangs in overlaps. Higher values for these thresholds allow more overlaps to pass the check. There are some thresholds that may be hard-coded in PECAT. So, the modified parameters did not output the expected results. You can try adding the following parameters.

corr_correct_options=--min_coverage 0
asm1_assemble_options=--min_coverage 1

You can run pecat.pl assemble cfg to check if the parameters are working, without the need to run the unzip step."