hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
197 stars 59 forks source link

purple error: " htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: java.io.IOException: Is a directory, for input source" #232

Closed jamesdalg closed 2 years ago

jamesdalg commented 2 years ago

I've done GRIDSS, GRIPSS (both steps), COBALT, AMBER, and get an odd error when running purple. It seems to be trying to use my current directory as some type of input. Here's my syntax:

gridss -r /data/CCRBioinfo/dalgleishjl/sv_mapping/hg38_ref/hg38.fa \
                   -o /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/gridss_PAUTWB_T_N_paired_output.vcf.gz \
                   -a /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_assembly.bam \
                   -w /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/ \
                   -t 64 \
                   /data/CCRBioinfo/projects/TargetOsteo_WGS/bam_hg38/PAUTWB_N.bam /data/CCRBioinfo/projects/TargetOsteo_WGS/bam_hg38/PAUTWB_T.bam

java -jar  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/gripss/gripss-1.11.jar \
   -tumor PAUTWB_T \
   -reference PAUTWB_N \
   -ref_genome /data/CCRBioinfo/dalgleishjl/sv_mapping/hg38_ref/hg38.fa \
   -breakend_pon /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_single_breakend.38.bed \
   -breakpoint_pon /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_breakpoint.38.bedpe \
   -breakpoint_hotspot /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/external_resources/HMFTools-Resources/Known-Fusions/38/known_fusions.38.bedpe \
   -input_vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/gridss_PAUTWB_T_N_paired_output.vcf.gz \
   -output_vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_gripss.vcf.gz

java  -jar  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/gripss/gripss-1.11.jar  com.hartwig.hmftools.gripss.GripssHardFilterApplicationKt \
-input_vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_gripss.vcf.gz  \
-output_vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_gripss.filtered.vcf.gz \
   -tumor PAUTWB_T \
   -reference PAUTWB_N \
   -ref_genome /data/CCRBioinfo/dalgleishjl/sv_mapping/hg38_ref/hg38.fa \
   -breakend_pon /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_single_breakend.38.bed \
   -breakpoint_pon /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_breakpoint.38.bedpe \
   -breakpoint_hotspot /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/external_resources/HMFTools-Resources/Known-Fusions/38/known_fusions.38.bedpe 
#COBALT
#module load R
    java -cp -Xmx32G -jar /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/hmftools/cobalt/cobalt-1.11.jar com.hartwig.hmftools.cobalt.CobaltApplication    -reference PAUTWB_N -reference_bam /data/CCRBioinfo/projects/TargetOsteo_WGS/bam_hg38/PAUTWB_N.bam     -tumor PAUTWB_T -tumor_bam /data/CCRBioinfo/projects/TargetOsteo_WGS/bam_hg38/PAUTWB_T.bam     -output_dir /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_cobalt      -threads 16     -gc_profile /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/hmftools/cobalt/GC_profile.1000bp.38.cnp

#AMBER
#mkdir /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_amber/
java -cp -Xmx32G  -jar /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/hmftools/amber/amber-3.5.jar com.hartwig.hmftools.amber.AmberApplication    -reference PAUTWB_N -reference_bam /data/CCRBioinfo/projects/TargetOsteo_WGS/bam_hg38/PAUTWB_N.bam    -tumor PAUTWB_T -tumor_bam /data/CCRBioinfo/projects/TargetOsteo_WGS/bam_hg38/PAUTWB_T.bam  -output_dir /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_amber/   -threads 16    -loci /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/hmftools/amber/GermlineHetPon.38.vcf.gz 
#PURPLE:
mkdir /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_purple
java -jar /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/hmftools/purple/purple_v3.1.jar    -reference PAUTWB_N    -tumor PAUTWB_T    -output_dir /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_purple/    -amber /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_amber  -cobalt /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_cobalt  -gc_profile /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/hmftools/cobalt/GC_profile.1000bp.38.cnp    -ref_genome /data/CCRBioinfo/dalgleishjl/sv_mapping/hg38_ref/hg38.fa -structural_vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_gripss.vcf.gz -sv_recovery_vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_gripss.filtered.vcf.gz -ref_genome_version V38

This is the error:

bash-4.2$ java -jar /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/hmftools/purple/purple_v3.1.jar    -reference PAUTWB_N    -tumor PAUTWB_T    -output_dir /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_purple/    -amber /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_amber  -cobalt /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_cobalt  -gc_profile /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/hmftools/cobalt/GC_profile.1000bp.38.cnp    -ref_genome /data/CCRBioinfo/dalgleishjl/sv_mapping/hg38_ref/hg38.fa -structural_vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_gripss.vcf.gz -sv_recovery_vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_gripss.filtered.vcf.gz -ref_genome_version V38
08:54:37 - [INFO ] - PURPLE version: 3.1
08:54:37 - [INFO ] - Reference Sample: PAUTWB_N, Tumor Sample: PAUTWB_T
08:54:37 - [INFO ] - Output Directory: /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_purple/
08:54:38 - [INFO ] - Using ref genome: V38
08:54:40 - [INFO ] - Reading GC Profiles from /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/hmftools/cobalt/GC_profile.1000bp.38.cnp
08:54:45 - [INFO ] - Processing sample(ref=PAUTWB_N tumor=PAUTWB_T)
08:54:45 - [INFO ] - Reading amber QC from /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_amber/PAUTWB_T.amber.qc
08:54:45 - [INFO ] - Reading amber bafs from /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_amber/PAUTWB_T.amber.baf.tsv
08:54:46 - [INFO ] - Reading amber pcfs from /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_amber/PAUTWB_T.amber.baf.pcf
08:54:46 - [INFO ] - Average amber tumor depth is 138 reads implying an ambiguous BAF of 0.537
08:54:46 - [INFO ] - Reading cobalt ratios from /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_cobalt/PAUTWB_T.cobalt.ratio.tsv
08:54:51 - [INFO ] - Reading cobalt reference segments from /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_cobalt/PAUTWB_N.cobalt.ratio.pcf
08:54:51 - [INFO ] - Reading cobalt tumor segments from /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_cobalt/PAUTWB_T.cobalt.ratio.pcf
08:54:53 - [INFO ] - Loading structural variants from /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_gripss.vcf.gz
08:54:57 - [INFO ] - Somatic variants support disabled.
08:54:57 - [INFO ] - Sample gender is male
08:54:57 - [INFO ] - Applying segmentation
08:54:57 - [INFO ] - Merging reference and tumor ratio break points
08:55:02 - [INFO ] - Fitting purity
08:55:41 - [INFO ] - Sample maxDiploidProportion(0.454) diploidCandidates(93) purityRange(0.470 - 0.700) hasTumor(true)
08:55:41 - [INFO ] - Calculating copy number
08:55:42 - [INFO ] - Loading recovery candidates from /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_gripss.filtered.vcf.gz
08:55:42 - [INFO ] - Reapplying segmentation with 5 recovered structural variants
08:55:42 - [INFO ] - Merging reference and tumor ratio break points
08:55:47 - [INFO ] - Recalculating copy number
08:55:49 - [INFO ] - Calculating chromosome copy number arm
08:55:49 - [INFO ] - Generating QC Stats
08:55:49 - [INFO ] - Modelling somatic peaks
08:55:49 - [ERROR] - failed processing sample(PAUTWB_T): htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: java.io.IOException: Is a directory, for input source: file:///spin1/home/linux/dalgleishjl/
htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: java.io.IOException: Is a directory, for input source: file:///spin1/home/linux/dalgleishjl/
        at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263)
        at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:102)
        at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:127)
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:81)
        at htsjdk.variant.vcf.VCFFileReader.<init>(VCFFileReader.java:145)
        at htsjdk.variant.vcf.VCFFileReader.<init>(VCFFileReader.java:95)
        at com.hartwig.hmftools.purple.somatic.SomaticPeakStream.somaticPeakModel(SomaticPeakStream.java:96)
        at com.hartwig.hmftools.purple.PurpleApplication.processSample(PurpleApplication.java:288)
        at com.hartwig.hmftools.purple.PurpleApplication.run(PurpleApplication.java:150)
        at com.hartwig.hmftools.purple.PurpleApplication.main(PurpleApplication.java:503)
Caused by: htsjdk.samtools.util.RuntimeIOException: java.io.IOException: Is a directory
        at htsjdk.tribble.readers.SynchronousLineReader.readLine(SynchronousLineReader.java:53)
        at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:24)
        at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:11)
        at htsjdk.samtools.util.AbstractIterator.hasNext(AbstractIterator.java:44)
        at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:89)
        at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
        at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
        at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:261)
        ... 10 more
Caused by: java.io.IOException: Is a directory
        at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at java.base/sun.nio.ch.FileDispatcherImpl.read(FileDispatcherImpl.java:48)
        at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:330)
        at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:296)
        at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:273)
        at java.base/sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:229)
        at htsjdk.samtools.seekablestream.SeekablePathStream.read(SeekablePathStream.java:86)
        at java.base/java.io.InputStream.read(InputStream.java:218)
        at htsjdk.tribble.readers.PositionalBufferedStream.fill(PositionalBufferedStream.java:132)
        at htsjdk.tribble.readers.PositionalBufferedStream.read(PositionalBufferedStream.java:84)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:270)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:313)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:188)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:177)
        at htsjdk.tribble.readers.LongLineBufferedReader.fill(LongLineBufferedReader.java:140)
        at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:300)
        at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:356)
        at htsjdk.tribble.readers.SynchronousLineReader.readLine(SynchronousLineReader.java:51)
        ... 17 more
08:55:49 - [INFO ] - Complete
jamesdalg commented 2 years ago

Figured it out! You fixed this issue earlier in https://github.com/hartwigmedical/hmftools/issues/217

I used the 3.2 beta version and had no issues after loading circos.

module load circos
java -jar /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/hmftools/purple/purple_v3.2_beta.jar    -reference PAUTWB_N    -tumor PAUTWB_T    -output_dir /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_purple/    -amber /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_amber  -cobalt /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_cobalt  -gc_profile /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/hmftools/cobalt/GC_profile.1000bp.38.cnp    -ref_genome /data/CCRBioinfo/dalgleishjl/sv_mapping/hg38_ref/hg38.fa -structural_vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_gripss.vcf.gz -sv_recovery_vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_gripss.filtered.vcf.gz -ref_genome_version V38 -threads 16 -circos /usr/local/apps/circos/0.69-9/bin/circos
12:47:24 - [INFO ] - PURPLE version: 3.2
12:47:24 - [INFO ] - Reference Sample: PAUTWB_N, Tumor Sample: PAUTWB_T
12:47:24 - [INFO ] - Output Directory: /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_purple/
12:47:24 - [INFO ] - Using ref genome: V38
12:47:25 - [INFO ] - Reading GC Profiles from /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/hmftools/cobalt/GC_profile.1000bp.38.cnp
12:47:28 - [INFO ] - Processing sample(ref=PAUTWB_N tumor=PAUTWB_T)
12:47:28 - [INFO ] - Reading amber QC from /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_amber/PAUTWB_T.amber.qc
12:47:28 - [INFO ] - Reading amber bafs from /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_amber/PAUTWB_T.amber.baf.tsv
12:47:29 - [INFO ] - Reading amber pcfs from /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_amber/PAUTWB_T.amber.baf.pcf
12:47:29 - [INFO ] - Average amber tumor depth is 138 reads implying an ambiguous BAF of 0.537
12:47:29 - [INFO ] - Reading cobalt ratios from /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_cobalt/PAUTWB_T.cobalt.ratio.tsv
12:47:35 - [INFO ] - Reading cobalt reference segments from /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_cobalt/PAUTWB_N.cobalt.ratio.pcf
12:47:35 - [INFO ] - Reading cobalt tumor segments from /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_cobalt/PAUTWB_T.cobalt.ratio.pcf
12:47:36 - [INFO ] - Loading structural variants from /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_gripss.vcf.gz
12:47:38 - [INFO ] - Sample gender is male
12:47:38 - [INFO ] - Applying segmentation
12:47:38 - [INFO ] - Merging reference and tumor ratio break points
12:47:42 - [INFO ] - Fitting purity
12:47:48 - [INFO ] - Sample maxDiploidProportion(0.454) diploidCandidates(93) purityRange(0.470 - 0.700) hasTumor(true)
12:47:48 - [INFO ] - Calculating copy number
12:47:49 - [INFO ] - Loading recovery candidates from /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PAUTWB_T_N_gridss_paired/PAUTWB_T_N_paired_gripss.filtered.vcf.gz
12:47:49 - [INFO ] - Reapplying segmentation with 5 recovered structural variants
12:47:49 - [INFO ] - Merging reference and tumor ratio break points
12:47:51 - [INFO ] - Recalculating copy number
12:47:53 - [INFO ] - Generating QC Stats
12:47:53 - [INFO ] - Modelling somatic peaks
12:47:53 - [INFO ] - Enriching somatic variants
12:48:06 - [INFO ] - Writing purple data to directory: /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_purple/
12:48:07 - [INFO ] - Generating charts
12:48:07 - [INFO ] - Generating PAUTWB_T.circos.png via command: /usr/local/apps/circos/0.69-9/bin/circos -nosvg -conf /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_purple/circos/PAUTWB_T.circos.conf -outputdir /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_purple/plot -outputfile PAUTWB_T.circos.png
12:48:07 - [INFO ] - Generating PAUTWB_T.input.png via command: /usr/local/apps/circos/0.69-9/bin/circos -nosvg -conf /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_purple/circos/PAUTWB_T.input.conf -outputdir /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_purple/plot -outputfile PAUTWB_T.input.png
12:48:07 - [INFO ] - Executing R script via command: Rscript /tmp/script1682455396990319415.R PAUTWB_T /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_purple/ /data/CCRBioinfo/dalgleishjl/sv_mapping/PAUTWB_purple//plot
12:48:38 - [INFO ] - Complete