Bad quality threshold is "("

mourisl / Rcorrector

Error correction for Illumina RNA-seq reads

GNU General Public License v3.0

63 stars 18 forks source link

Bad quality threshold is "(" #14

Closed mdgn15 closed 5 years ago

mdgn15 commented 5 years ago

Hello,

I am trying to use rcorrector but I am having problems with Bad quality threshold step.

Put the kmers into bloom filter Count the kmers in the bloom filter Dump the kmers Error correction Weak kmer threshold rate: 0.193073 (estimated from 0.950/1 of the chosen kmers) Bad quality threshold is '('

Every step until Bad quality threshold is working fine. Some of my data sets are working properly (as in Bad quality threshold is '5' or "!"), but some jobs stuck at this step (for more than 24h). Do you know what is causing this and how can I solve it? I have RNA-seq datasets of 9-15 million reads from plant samples and I am using rcorrector on the raw data.

Thanks

mourisl commented 5 years ago

Sorry for the late reply. What is the exact command you used for those data sets? And what is the number for "Stored xxx kmers"? Thanks.

mdgn15 commented 5 years ago

Exact batch script is this:

#!/bin/bash
#PBS -N rcorrector
#PBS -l select=1:ncpus=16:mem=32gb:scratch_local=40gb
#PBS -l walltime=48:00:00

trap 'clean_scratch' TERM EXIT
cd $SCRATCHDIR
cp /dir/data_R1.fq $SCRATCHDIR
cp /dir/data_R2.fq $SCRATCHDIR
export PATH='/softwares/rcorrector':$PATH
export PATH='/softwares/rcorrector/jellyfish':$PATH
perl /softwares/rcorrector/run_rcorrector.pl -1 data_R1.fq -2 data_R2.fq -t 16
cp data_R1.cor.fq data_R2.cor.fq /storage/rcorrector/ || export CLEAN_SCRATCH=false

Stored 181579397 kmers and Stored 180520606 kmers for two different species. After 48 hours the last step is still Bad quality threshold is '('. Readlengths are between 60-286 for both. Thank you.

mourisl commented 5 years ago

Are the files data_R1.fq and data_R2.fq still empty?

mdgn15 commented 5 years ago

The .cor.fq files are there and the size was increasing over time. But after 48 hours the file sizes are around 2Gb each (from 8gb raw files) and fastQC gives the error that they are truncated.

To give perspective, one dataset with half the size of the ones I am having problem with finished in three hours. Other ones are not finishing even after 2 days.

I will keep it running for another 48h to see what will happen. But for sure there is a problem.

mourisl commented 5 years ago

Rcorrector might be too slow on low quality long reads. If you don't want to wait that long, I think you can try smaller value for "-maxcorK" or "-maxcor".

mdgn15 commented 5 years ago

I understand, it answers this problem. Thank you for your time.