Closed dpcook closed 1 year ago
Yes there are some small errors due to external package updates, however this should not stop the pipeline or affect the relevant results. I think I fixed most of those in my dev-branch (https://github.com/roland-rad-lab/MoCaSeq/tree/human-pipeline), however there is no pre-build docker container for that, just a Dockerfile which has to be build.
Sorry for the inactivity not sorting this out over the last two weeks.
The pipeline doesn't stop and a handful of the relevant results do seem to get produced, there are still some (perhaps related) issues with at least Manta and Strelka
Full report.txt from the QC directory: MoCaSeq_Test.report.txt
If these seem like issues that would be resolved with the dev-branch, I can try it out (though will have to figure out how to build the docker image without having access to docker on our cluster)
End of the Manta run
[2023-05-24T14:46:28.121993Z] [galen2.localdomain] [2322829_1] [WorkflowRunner] Manta workflow successfully completed.
[2023-05-24T14:46:28.121993Z] [galen2.localdomain] [2322829_1] [WorkflowRunner]
[2023-05-24T14:46:28.121993Z] [galen2.localdomain] [2322829_1] [WorkflowRunner] workflow version: 1.6.0
[2023-05-24T14:46:28.133837Z] [galen2.localdomain] [2322829_1] [WorkflowRunner]
[2023-05-24T14:46:28.144625Z] [galen2.localdomain] [2322829_1] [WorkflowRunner] Workflow successfully completed all tasks
[2023-05-24T14:46:28.156931Z] [galen2.localdomain] [2322829_1] [WorkflowRunner] Elapsed time for full workflow: 66 sec
rm: cannot remove 'MoCaSeq_Test/results/Manta/MoCaSeq_Test.Tumor.Manta.annotated.one.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Manta/MoCaSeq_Test.Tumor.Manta.annotated.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Manta/MoCaSeq_Test.Tumor.Manta.annotated.vcf.stats': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Manta/MoCaSeq_Test.Tumor.Manta.annotated.vcf.stats.genes.txt': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Manta/MoCaSeq_Test.Tumor.Manta.vcf.gz.tbi': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Manta/MoCaSeq_Test.Normal.Manta.annotated.one.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Manta/MoCaSeq_Test.Normal.Manta.annotated.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Manta/MoCaSeq_Test.Normal.Manta.annotated.vcf.stats': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Manta/MoCaSeq_Test.Normal.Manta.annotated.vcf.stats.genes.txt': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Manta/MoCaSeq_Test.Normal.Manta.vcf.gz.tbi': No such file or directory
rm: cannot remove 'Mutect2FilteringStats.tsv': No such file or directory
-------------------- EXCEPTION --------------------
MSG: ERROR: PolyPhen not available
STACK Bio::EnsEMBL::VEP::AnnotationSource::Cache::Transcript::check_sift_polyphen /opt/vep-96/modules/Bio/EnsEMBL/VEP/AnnotationSource/Cache/Transcript.pm:168
STACK Bio::EnsEMBL::VEP::AnnotationSource::Cache::Transcript::new /opt/vep-96/modules/Bio/EnsEMBL/VEP/AnnotationSource/Cache/Transcript.pm:121
STACK Bio::EnsEMBL::VEP::CacheDir::get_all_AnnotationSources /opt/vep-96/modules/Bio/EnsEMBL/VEP/CacheDir.pm:150
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_from_cache /opt/vep-96/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:121
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /opt/vep-96/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:91
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /opt/vep-96/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
STACK Bio::EnsEMBL::VEP::Runner::init /opt/vep-96/modules/Bio/EnsEMBL/VEP/Runner.pm:123
STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep-96/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /opt/vep-96/./vep:218
Date (localtime) = Wed May 24 10:46:48 2023
Ensembl API version = 96
---------------------------------------------------
STATUS: Running VEP and writing to: MoCaSeq_Test/results/Manta/MoCaSeq_Test.Manta.vep.vcf
WARNING: No genotype column for MoCaSeq_Test in VCF!
WARNING: No genotype column for --vcf-tumor-id in VCF!
And presumably related--Strelka:
--- Strelka Postprocessing I (Indel size selection, filtering) ----
Wed May 24 10:49:09 EDT 2023 timestamp: 1684939749
14:49:11.384 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
May 24, 2023 2:49:11 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
14:49:11.537 INFO SelectVariants - ------------------------------------------------------------
14:49:11.537 INFO SelectVariants - The Genome Analysis Toolkit (GATK) v4.1.7.0
14:49:11.537 INFO SelectVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
14:49:11.537 INFO SelectVariants - Executing as dcook@galen2 on Linux v4.18.0-425.3.1.el8.x86_64 amd64
14:49:11.537 INFO SelectVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
14:49:11.537 INFO SelectVariants - Start Date/Time: May 24, 2023 2:49:11 PM UTC
14:49:11.538 INFO SelectVariants - ------------------------------------------------------------
14:49:11.538 INFO SelectVariants - ------------------------------------------------------------
14:49:11.538 INFO SelectVariants - HTSJDK Version: 2.21.2
14:49:11.538 INFO SelectVariants - Picard Version: 2.21.9
14:49:11.538 INFO SelectVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:49:11.538 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:49:11.538 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:49:11.538 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:49:11.538 INFO SelectVariants - Deflater: IntelDeflater
14:49:11.538 INFO SelectVariants - Inflater: IntelInflater
14:49:11.538 INFO SelectVariants - GCS max retries/reopens: 20
14:49:11.538 INFO SelectVariants - Requester pays: disabled
14:49:11.538 INFO SelectVariants - Initializing engine
14:49:11.757 INFO FeatureManager - Using codec VCFCodec to read file file:///home/wranalab/dcook/projects/mouse_cna/MoCaSeq_Test/results/Strelka/MoCaSeq_Test.str.indel.filtered.vcf
14:49:11.776 INFO SelectVariants - Done initializing engine
14:49:11.807 INFO ProgressMeter - Starting traversal
14:49:11.807 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
14:49:11.819 INFO ProgressMeter - unmapped 0.0 2 10000.0
14:49:11.819 INFO ProgressMeter - Traversal complete. Processed 2 total variants in 0.0 minutes.
14:49:11.883 INFO SelectVariants - Shutting down engine
[May 24, 2023 2:49:11 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2211971072
---- Strelka Postprocessing II (Filtering out known SNV/Indel using dbSNP or the Sanger Mouse database) ----
Wed May 24 10:49:11 EDT 2023 timestamp: 1684939751
---- Strelka Postprocessing III (Extracting allele frequencies) ----
Wed May 24 10:54:58 EDT 2023 timestamp: 1684940098
-------------------- EXCEPTION --------------------
MSG: ERROR: Cache directory /home/wranalab/dcook/.vep/mus_musculus not found
STACK Bio::EnsEMBL::VEP::CacheDir::dir /opt/vep-96/modules/Bio/EnsEMBL/VEP/CacheDir.pm:311
STACK Bio::EnsEMBL::VEP::CacheDir::init /opt/vep-96/modules/Bio/EnsEMBL/VEP/CacheDir.pm:227
STACK Bio::EnsEMBL::VEP::CacheDir::new /opt/vep-96/modules/Bio/EnsEMBL/VEP/CacheDir.pm:111
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_from_cache /opt/vep-96/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:115
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /opt/vep-96/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:91
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /opt/vep-96/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
STACK Bio::EnsEMBL::VEP::Runner::init /opt/vep-96/modules/Bio/EnsEMBL/VEP/Runner.pm:123
STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep-96/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /opt/vep-96/./vep:218
Date (localtime) = Wed May 24 10:54:59 2023
Ensembl API version = 96
---------------------------------------------------
-------------------- EXCEPTION --------------------
MSG: ERROR: Cache directory /home/wranalab/dcook/.vep/mus_musculus not found
STACK Bio::EnsEMBL::VEP::CacheDir::dir /opt/vep-96/modules/Bio/EnsEMBL/VEP/CacheDir.pm:311
STACK Bio::EnsEMBL::VEP::CacheDir::init /opt/vep-96/modules/Bio/EnsEMBL/VEP/CacheDir.pm:227
STACK Bio::EnsEMBL::VEP::CacheDir::new /opt/vep-96/modules/Bio/EnsEMBL/VEP/CacheDir.pm:111
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_from_cache /opt/vep-96/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:115
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /opt/vep-96/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:91
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /opt/vep-96/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
STACK Bio::EnsEMBL::VEP::Runner::init /opt/vep-96/modules/Bio/EnsEMBL/VEP/Runner.pm:123
STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep-96/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /opt/vep-96/./vep:218
Date (localtime) = Wed May 24 10:54:59 2023
Ensembl API version = 96
---------------------------------------------------
STATUS: Running VEP and writing to: MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.snp.vep.vcf
STATUS: Running VEP and writing to: MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.indel.vep.vcf
Use of uninitialized value $lines in split at /opt/vcf2maf-1.6.17/maf2vcf.pl line 107.
ERROR: Make sure that ref-fasta is the same genome build as your MAF: /var/pipeline/ref/GRCm38.p6/GRCm38.p6.fna
sed: can't read MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.indel.maf.vcf: No such file or directory
[bgzip] No such file or directory: MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.indel.maf.vcf
tbx_index_build failed: MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.indel.maf.vcf.gz
Checking the headers and starting positions of 2 files
[E::hts_open_format] Failed to open file "MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.indel.maf.vcf.gz" : No such file or directory
Failed to open: MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.indel.maf.vcf.gz
---- Strelka Postprocessing IV (Annotate calls) ----
Wed May 24 10:55:02 EDT 2023 timestamp: 1684940102
rm: cannot remove 'MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.ann1.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.ann2.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.ann3.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.ann4.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.ann5.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.ann6.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.ann7.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.indel.maf.vcf.gz': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.indel.maf.vcf.gz.tbi': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.indel.vep.pairs.tsv': No such file or directory
rm: cannot remove 'Mutect2FilteringStats.tsv': No such file or directory
-------------------- EXCEPTION --------------------
MSG: ERROR: PolyPhen not available
STACK Bio::EnsEMBL::VEP::AnnotationSource::Cache::Transcript::check_sift_polyphen /opt/vep-96/modules/Bio/EnsEMBL/VEP/AnnotationSource/Cache/Transcript.pm:168
STACK Bio::EnsEMBL::VEP::AnnotationSource::Cache::Transcript::new /opt/vep-96/modules/Bio/EnsEMBL/VEP/AnnotationSource/Cache/Transcript.pm:121
STACK Bio::EnsEMBL::VEP::CacheDir::get_all_AnnotationSources /opt/vep-96/modules/Bio/EnsEMBL/VEP/CacheDir.pm:150
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_from_cache /opt/vep-96/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:121
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /opt/vep-96/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:91
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /opt/vep-96/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
STACK Bio::EnsEMBL::VEP::Runner::init /opt/vep-96/modules/Bio/EnsEMBL/VEP/Runner.pm:123
STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep-96/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /opt/vep-96/./vep:218
Date (localtime) = Wed May 24 10:55:18 2023
Ensembl API version = 96
---------------------------------------------------
ERROR: Provided --input-vcf is missing or empty: MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.vcf
awk: cannot open MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.vep.maf (No such file or directory)
awk: cannot open MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.vep.maf (No such file or directory)
awk: cannot open MoCaSeq_Test/results/Strelka/MoCaSeq_Test.Strelka.vep.maf (No such file or directory)
And later, something with the Mutect outputs:
15:18:33.125 INFO IntervalArgumentCollection - Processing 2725521370 bp from intervals
15:18:33.130 INFO VariantQC - Shutting down engine
[May 24, 2023 3:18:33 PM UTC] com.github.discvrseq.walkers.variantqc.VariantQC done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=2058354688
***********************************************************************
A USER ERROR has occurred: Input MoCaSeq_Test/results/Mutect2/MoCaSeq_Test.Mutect2.vcf must support random access to enable traversal by intervals. If it's a file, please index it using the bundled tool IndexFeatureFile
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException: Input MoCaSeq_Test/results/Mutect2/MoCaSeq_Test.Mutect2.vcf must support random access to enable traversal by intervals. If it's a file, please index it using the bundled tool IndexFeatureFile
at org.broadinstitute.hellbender.engine.FeatureDataSource.setIntervalsForTraversal(FeatureDataSource.java:420)
at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:47)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at com.github.discvrseq.Main.main(Main.java:50)
rm: cannot remove 'MoCaSeq_Test/results/Mutect2/MoCaSeq_Test.m2.filt.filtered.selected.vcf.idx': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Mutect2/MoCaSeq_Test.m2.filt.filtered.selected.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Mutect2/MoCaSeq_Test.Mutect2.ann1.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Mutect2/MoCaSeq_Test.Mutect2.ann2.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Mutect2/MoCaSeq_Test.Mutect2.ann3.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Mutect2/MoCaSeq_Test.Mutect2.ann4.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Mutect2/MoCaSeq_Test.Mutect2.ann5.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Mutect2/MoCaSeq_Test.Mutect2.ann6.vcf': No such file or directory
rm: cannot remove 'MoCaSeq_Test/results/Mutect2/MoCaSeq_Test.Mutect2.ann7.vcf': No such file or directory
rm: cannot remove 'Mutect2FilteringStats.tsv': No such file or directory
Error in library(SomaticSignatures) :
there is no package called ‘SomaticSignatures’
Execution halted
So I also ran the test and can confirm these errors/warnings, however as I said these are just some negligible downstream analysis and you will get all relevant results, which are copynumbers (HMMCopy, Copywriter) and mutations (Mutect2) for most people.
Running the --test as a singularity image. Pipeline runs to completion, but there are several errors in the report and presumably missing files in the output. Full report: MoCaSeq_Test.report.txt
I think the first error that pops up is:
And a few lines down:
And then a bit lower, there seem to be issues (not sure if they're related or not?):
And then something about a cache directory in the user home directory? Perhaps this is related to #17 in the end.