Closed peflanag closed 3 years ago
No you cannot just add it to the command. You would need to open the respective script and add the command there to the gatk command. As I do not know at which step exactly you got the error I woud guess: TBrefine.pm line 81 add -fixMisencodedQuals
But first I would check why this happens. Do you really use old illumina data with the ASCII 64 encoding? If not something else is wrong here. Maybe you can check the respective fastqs with fastqc to see.
Hi,
A handful of the files are old Illumina reads that I want to add to a current cluster of samples. I have attached the text log file below.
I still recommend you check your fastqs with the program "fastqc" and you will see which encoding is used. You should then change the encoding of the fastq files to phred33. See: https://www.biostars.org/p/166062/ Afterwards you can run MTBseq as you do usually.
Or you manually add that command, as I showed you above, run your samples and change it back. But starting with a proper encoding is the better option.
Perfect, thanks so much. I will try that!
Turns out the encoding is 1.5 but when I try to change with the tools on the link you supplied terminal keeps crashing or finishing as shown in the screen shot below. So I have decided to go to the TBrefine.pm to add the command at line 81 but where in the line do I add it? I have tried a few locations but it doesn't seem to work. Anywhere there is an X I have added the command and tried. If it cant be done its fine, its only 14 samples out of over 300 so I can just exclude them since there where from a few years ago!
$commandline = X "$GATK_call X --analysis_type RealignerTargetCreator X --reference_sequence $VAR_dir/$ref --input_file $BAM_OUT/$fullID.bam --downsample_to_coverage 10000 --num_threads $threads --out $GATK_OUT/$fullID.gatk.intervals 2>> $GATK_OUT/$logfile" X;
Cheers
Thanks for the feedback. In our experience, the GATK toolkit is very sensitive to malformed data not fully confirming to specifications. From your experience of other tools failing, I would guess that the fastq files are not only encoded to Illumina specs 1.5, but there is likely also some other problem with them. For confirmation, you could check their format with a validation tool: https://genome.sph.umich.edu/wiki/FastQValidator Maybe try some tools and hope to find a program that will accept these files as input and is able to return valid fastq files. Or try contacting Illumina support, maybe they can offer a solution?
best wishes, Thomas
Cheers Thomas. I might contact illumina and see. However, Int he grand scheme of things I might just have to accept I can’t add them to part of my analysis which will be ok I reckon.
Hi, I got the following error;
ERROR MESSAGE: SAM/BAM/CRAM file htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter@e584b2b appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 66. Please see https://software.broadinstitute.org/gatk/documentation/article?id=6470 for more details and options related to this error.
I went to the site and it says;
If this happens to you, you'll need to run again with the flag [ --fix_misencoded_quality_scores / -fixMisencodedQuals]. What will happen is that the engine will simply subtract 31 from every quality score as it is read in, and proceed with the corrected values. Output files will include the correct scores where applicable.
I'm just wondering, in my sbatch script can I just throw the commands in like this;
!/bin/sh
SLURM Commands
SBATCH --partition=ProdQ
SBATCH --nodes=1
SBATCH --time=24:00:00
SBATCH --job-name=C10
SBATCH --account=ndlif075c
SBATCH --output=TBfull_Log.txt
SBATCH --mail-user=xxxxxxxxxxxxxxxx
SBATCH --mail-type=BEGIN,END
cd $SLURM_SUBMIT_DIR
load the environment module
module load conda/2
load the conda environment
source activate MTBseq
BASH Commands
MTBseq --step TBfull --distance 5 ----fix_misencoded_quality_scores -fixMisencodedQuals --threads 8
Cheers,
P