Closed mdgn15 closed 5 years ago
Sorry for the late reply. What is the exact command you used for those data sets? And what is the number for "Stored xxx kmers"? Thanks.
Exact batch script is this:
#!/bin/bash
#PBS -N rcorrector
#PBS -l select=1:ncpus=16:mem=32gb:scratch_local=40gb
#PBS -l walltime=48:00:00
trap 'clean_scratch' TERM EXIT
cd $SCRATCHDIR
cp /dir/data_R1.fq $SCRATCHDIR
cp /dir/data_R2.fq $SCRATCHDIR
export PATH='/softwares/rcorrector':$PATH
export PATH='/softwares/rcorrector/jellyfish':$PATH
perl /softwares/rcorrector/run_rcorrector.pl -1 data_R1.fq -2 data_R2.fq -t 16
cp data_R1.cor.fq data_R2.cor.fq /storage/rcorrector/ || export CLEAN_SCRATCH=false
Stored 181579397 kmers and Stored 180520606 kmers for two different species. After 48 hours the last step is still Bad quality threshold is '('. Readlengths are between 60-286 for both. Thank you.
Are the files data_R1.fq and data_R2.fq still empty?
The .cor.fq files are there and the size was increasing over time. But after 48 hours the file sizes are around 2Gb each (from 8gb raw files) and fastQC gives the error that they are truncated.
To give perspective, one dataset with half the size of the ones I am having problem with finished in three hours. Other ones are not finishing even after 2 days.
I will keep it running for another 48h to see what will happen. But for sure there is a problem.
Rcorrector might be too slow on low quality long reads. If you don't want to wait that long, I think you can try smaller value for "-maxcorK" or "-maxcor".
I understand, it answers this problem. Thank you for your time.
Hello,
I am trying to use rcorrector but I am having problems with Bad quality threshold step.
Put the kmers into bloom filter Count the kmers in the bloom filter Dump the kmers Error correction Weak kmer threshold rate: 0.193073 (estimated from 0.950/1 of the chosen kmers) Bad quality threshold is '('
Every step until Bad quality threshold is working fine. Some of my data sets are working properly (as in Bad quality threshold is '5' or "!"), but some jobs stuck at this step (for more than 24h). Do you know what is causing this and how can I solve it? I have RNA-seq datasets of 9-15 million reads from plant samples and I am using rcorrector on the raw data.
Thanks