trinityrnaseq / trinityrnaseq

Trinity RNA-Seq de novo transcriptome assembly
BSD 3-Clause "New" or "Revised" License
834 stars 320 forks source link

Variant calling error (Following Variant calling #388) #400

Closed sofiapar265 closed 6 years ago

sofiapar265 commented 6 years ago

Hi Brian, Sorry I was out of office for some days. I would like to thank you for the support on this issue. I did run everything as you suggested but I got again an error. In mine case it didn't even reach the haplotype caller stage. it seems than when I run the script, up to a point, is crashing when dedupped.valid.bam is created. I don't know if it is only a problem of my samtools version or a path error. I tried to pull the file into the same path with the script but still doesn't work. My version of samtools is Version: 0.1.19.

Here is the error.

Running: touch /media/alan/Potsdam_Dennis/Sofia/variants_lib02_outdir/trinity_genes.fasta.gatk_chkpts/bam_validate.ok Running: /home/alan/Documents/My_installed_programs/trinityrnaseq/Analysis/SuperTranscripts/AllelicVariants/util/clean_bam.pl dedupped.bam dedupped.bam.validation dedupped.valid.bam

Usage: samtools view [options] | [region1 [...]]

Options: -b output BAM -h print header for the SAM output -H print header only (no alignments) -S input is SAM -u uncompressed BAM output (force -b) -1 fast compression (force -b) -x output FLAG in HEX (samtools-C specific) -X output FLAG in string (samtools-C specific) -c print only the count of matching records -B collapse the backward CIGAR operation -@ INT number of BAM compression threads [0] -L FILE output alignments overlapping the input BED FILE [null] -t FILE list of reference names and lengths (force -S) [null] -T FILE reference sequence file (force -S) [null] -o FILE output file name [stdout] -R FILE list of read groups to be outputted [null] -f INT required flag, 0 for unset [0] -F INT filtering flag, 0 for unset [0] -q INT minimum mapping quality [0] -l STR only output reads in library STR [null] -r STR only output reads in read group STR [null] -s FLOAT fraction of templates to subsample; integer part as seed [-1] -? longer help

CMD: samtools index dedupped.valid.bam open: No such file or directory [bam_index_build2] fail to open the BAM file. Running: touch /media/alan/Potsdam_Dennis/Sofia/variants_lib02_outdir/trinity_genes.fasta.gatk_chkpts/make_valid_dedupped_bam.ok Running: java -jar /home/alan/Documents/My_installed_programs/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar -T SplitNCigarReads -R /media/alan/Potsdam_Dennis/Sofia/trinity_genes.fasta -I dedupped.valid.bam -o splitNCigar.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS --validation_strictness LENIENT INFO 15:44:33,499 HelpFormatter - ---------------------------------------------------------------------------------- INFO 15:44:33,515 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50 INFO 15:44:33,515 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute INFO 15:44:33,515 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk INFO 15:44:33,515 HelpFormatter - [Tue Feb 06 15:44:33 CET 2018] Executing on Linux 4.4.0-93-generic amd64 INFO 15:44:33,515 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_144-b01 INFO 15:44:33,517 HelpFormatter - Program Args: -T SplitNCigarReads -R /media/alan/Potsdam_Dennis/Sofia/trinity_genes.fasta -I dedupped.valid.bam -o splitNCigar.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS --validation_strictness LENIENT INFO 15:44:33,519 HelpFormatter - Executing as alan@enigma on Linux 4.4.0-93-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_144-b01. INFO 15:44:33,519 HelpFormatter - Date/Time: 2018/02/06 15:44:33 INFO 15:44:33,519 HelpFormatter - ---------------------------------------------------------------------------------- INFO 15:44:33,519 HelpFormatter - ---------------------------------------------------------------------------------- ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/home/alan/Documents/My_installed_programs/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console... INFO 15:44:33,657 GenomeAnalysisEngine - Deflater: IntelDeflater INFO 15:44:33,657 GenomeAnalysisEngine - Inflater: IntelInflater INFO 15:44:33,657 GenomeAnalysisEngine - Strictness is LENIENT INFO 15:46:26,435 GenomeAnalysisEngine - Downsampling Settings: No downsampling INFO 15:46:26,439 SAMDataSource$SAMReaders - Initializing SAMRecords in serial

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Could not read file /media/alan/Potsdam_Dennis/Sofia/variants_lib02_outdir/dedupped.valid.bam because java.io.FileNotFoundException: dedupped.valid.bam (No such file or directory)
ERROR ------------------------------------------------------------------------------------------

Traceback (most recent call last): File "/home/alan/Documents/My_installed_programs/trinityrnaseq/Analysis/SuperTranscripts/AllelicVariants/run_variant_calling.py", line 228, in main() File "/home/alan/Documents/My_installed_programs/trinityrnaseq/Analysis/SuperTranscripts/AllelicVariants/run_variant_calling.py", line 194, in main pipeliner.run() File "/home/alan/Documents/My_installed_programs/trinityrnaseq/Analysis/SuperTranscripts/AllelicVariants/../../../PyLib/Pipeliner.py", line 59, in run run_cmd(cmd.get_cmd(), cmd.get_ignore_error_setting()) File "/home/alan/Documents/My_installed_programs/trinityrnaseq/Analysis/SuperTranscripts/AllelicVariants/../../../PyLib/Pipeliner.py", line 23, in run_cmd raise e subprocess.CalledProcessError: Command 'java -jar /home/alan/Documents/My_installed_programs/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar -T SplitNCigarReads -R /media/alan/Potsdam_Dennis/Sofia/trinity_genes.fasta -I dedupped.valid.bam -o splitNCigar.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS --validation_strictness LENIENT' returned non-zero exit status 1

Thanks a lot in advance!

brianjohnhaas commented 6 years ago

It might be a samtools version issue. Do you have at least samtools v1.3 installed?

On Fri, Feb 9, 2018 at 5:55 AM, sofiapar265 notifications@github.com wrote:

Hi Brian, Sorry I was out of office for some days. I would like to thank you for the support on this issue. I did run everything as you suggested but I got again an error. In mine case it didn't even reach the haplotype caller stage. it seems than when I run the script, up to a point, is crashing when dedupped.valid.bam is created. I don't know if it is only a problem of my samtools version or a path error. I tried to pull the file into the same path with the script but still doesn't work. My version of samtools is Version: 0.1.19.

Here is the error.

Running: touch /media/alan/Potsdam_Dennis/Sofia/variants_lib02_outdir/ trinity_genes.fasta.gatk_chkpts/bam_validate.ok Running: /home/alan/Documents/My_installed_programs/ trinityrnaseq/Analysis/SuperTranscripts/AllelicVariants/util/clean_bam.pl dedupped.bam dedupped.bam.validation dedupped.valid.bam

Usage: samtools view [options] | [region1 [...]]

Options: -b output BAM -h print header for the SAM output -H print header only (no alignments) -S input is SAM -u uncompressed BAM output (force -b) -1 fast compression (force -b) -x output FLAG in HEX (samtools-C specific) -X output FLAG in string (samtools-C specific) -c print only the count of matching records -B collapse the backward CIGAR operation -@ INT number of BAM compression threads [0] -L FILE output alignments overlapping the input BED FILE [null] -t FILE list of reference names and lengths (force -S) [null] -T FILE reference sequence file (force -S) [null] -o FILE output file name [stdout] -R FILE list of read groups to be outputted [null] -f INT required flag, 0 for unset [0] -F INT filtering flag, 0 for unset [0] -q INT minimum mapping quality [0] -l STR only output reads in library STR [null] -r STR only output reads in read group STR [null] -s FLOAT fraction of templates to subsample; integer part as seed [-1] -? longer help

CMD: samtools index dedupped.valid.bam open: No such file or directory [bam_index_build2] fail to open the BAM file. Running: touch /media/alan/Potsdam_Dennis/Sofia/variants_lib02_outdir/ trinity_genes.fasta.gatk_chkpts/make_valid_dedupped_bam.ok Running: java -jar /home/alan/Documents/My_installed_programs/ GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar -T SplitNCigarReads -R /media/alan/Potsdam_Dennis/Sofia/trinity_genes.fasta -I dedupped.valid.bam -o splitNCigar.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS --validation_strictness LENIENT INFO 15:44:33,499 HelpFormatter - ------------------------------

INFO 15:44:33,515 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50 INFO 15:44:33,515 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute INFO 15:44:33,515 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk INFO 15:44:33,515 HelpFormatter - [Tue Feb 06 15:44:33 CET 2018] Executing on Linux 4.4.0-93-generic amd64 INFO 15:44:33,515 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_144-b01 INFO 15:44:33,517 HelpFormatter - Program Args: -T SplitNCigarReads -R /media/alan/Potsdam_Dennis/Sofia/trinity_genes.fasta -I dedupped.valid.bam -o splitNCigar.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS --validation_strictness LENIENT INFO 15:44:33,519 HelpFormatter - Executing as alan@enigma on Linux 4.4.0-93-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_144-b01. INFO 15:44:33,519 HelpFormatter - Date/Time: 2018/02/06 15:44:33 INFO 15:44:33,519 HelpFormatter - ------------------------------

INFO 15:44:33,519 HelpFormatter - ------------------------------

ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/home/alan/Documents/My_installed_programs/ GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK. jar!/META-INF/log4j-provider.properties ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console... INFO 15:44:33,657 GenomeAnalysisEngine - Deflater: IntelDeflater INFO 15:44:33,657 GenomeAnalysisEngine - Inflater: IntelInflater INFO 15:44:33,657 GenomeAnalysisEngine - Strictness is LENIENT INFO 15:46:26,435 GenomeAnalysisEngine - Downsampling Settings: No downsampling INFO 15:46:26,439 SAMDataSource$SAMReaders - Initializing SAMRecords in serial ERROR ------------------------------------------------------------ ------------------------------ ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836): ERROR ERROR This means that one or more arguments or inputs in your command are incorrect. ERROR The error message below tells you what is the problem. ERROR ERROR If the problem is an invalid argument, please check the online documentation guide ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool. ERROR ERROR Visit our website and forum for extensive documentation and answers to ERROR commonly asked questions https://software. broadinstitute.org/gatk ERROR ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself. ERROR ERROR MESSAGE: Could not read file /media/alan/Potsdam_Dennis/ Sofia/variants_lib02_outdir/dedupped.valid.bam because java.io.FileNotFoundException: dedupped.valid.bam (No such file or directory) ERROR ------------------------------

Traceback (most recent call last): File "/home/alan/Documents/My_installed_programs/trinityrnaseq/Analysis/ SuperTranscripts/AllelicVariants/run_variant_calling.py", line 228, in main() File "/home/alan/Documents/My_installed_programs/trinityrnaseq/Analysis/ SuperTranscripts/AllelicVariants/run_variant_calling.py", line 194, in main pipeliner.run() File "/home/alan/Documents/My_installed_programs/trinityrnaseq/Analysis/ SuperTranscripts/AllelicVariants/../../../PyLib/Pipeliner.py", line 59, in run run_cmd(cmd.get_cmd(), cmd.get_ignore_error_setting()) File "/home/alan/Documents/My_installed_programs/trinityrnaseq/Analysis/ SuperTranscripts/AllelicVariants/../../../PyLib/Pipeliner.py", line 23, in runcmd raise e subprocess.CalledProcessError: Command 'java -jar /home/alan/Documents/My installed_programs/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar -T SplitNCigarReads -R /media/alan/Potsdam_Dennis/Sofia/trinity_genes.fasta -I dedupped.valid.bam -o splitNCigar.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS --validation_strictness LENIENT' returned non-zero exit status 1

Thanks a lot in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/trinityrnaseq/trinityrnaseq/issues/400, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMVX6B9i0HaWtwMxdqBZq5nMouqxY4pks5tTCQ0gaJpZM4R_v7Z .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

sofiapar265 commented 6 years ago

thanks Brian, Updating the samtools version solved the issue