hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
187 stars 58 forks source link

PURPLE fails with java.lang.NullPointerException with some input files #362

Closed micoli98 closed 1 year ago

micoli98 commented 1 year ago

I'm running a somatic copy number variation pipeline using GRIDSS, GRIPSS, AMBER, COBALT and PURPLE for multiple patients. What happens is that PURPLE fails with only some samples of different patients.

The PURPLE command is:

/usr/lib/jvm/java-11-openjdk-amd64/bin/java -jar /purple_v3.7.2.jar \
    -reference XXX_ref \
    -tumor XXX_tum \
    -amber /work/051122/XXX_pat \
    -cobalt /work/051122/XXX_pat \
    -gc_profile /Common_data/GC_profile.1000bp.38.cnp \
    -ref_genome /reference_genomes/GRCh38d1vd1/GRCh38.d1.vd1.fa \
    -ensembl_data_dir /Common_data/hmf_pipeline_resources.38_v5.31/common/ensembl_data \
    -structural_vcf /work/051122/XXX_pat/XXX.gripss.filtered.vcf.gz \
    -sv_recovery_vcf /work/051122/XXX_pat/XXX_tum_tum.gripss.vcf.gz \
    -germline_vcf /WGS_germline_variants/patient_VCFs/XXX_pat.vcf.gz \
    -somatic_vcf /WGS/variants_allMatched/byPatient_VCFs/XXX_pat.vcf.gz \
    -run_drivers \
    -driver_gene_panel /Common_data/hmf_pipeline_resources.38_v5.31/common/DriverGenePanel.38.tsv \
    -somatic_hotspots /Common_data/hmf_pipeline_resources.38_v5.31/variants/KnownHotspots.somatic.38.vcf.gz \
    -germline_hotspots /Common_data/hmf_pipeline_resources.38_v5.31/variants/KnownHotspots.germline.38.vcf.gz \
    -ref_genome_version 38 \
    -output_dir /work/051122/XXX_pat \
    -no_charts

The error that I get is:

15:20:23 - [INFO ] - Purple version: 3.7.2
15:20:23 - [INFO ] - output directory: /work/051122/XXX_pat/
15:20:23 - [INFO ] - reference(XXX_ref) tumor(XXX_tum) 
15:20:24 - [INFO ] - using ref genome: V38
15:20:24 - [INFO ] - loaded 1 alternative transcripts from 1 genes
15:20:27 - [INFO ] - reading GC Profiles from /work/Common_data/GC_profile.1000bp.38.cnp
15:20:30 - [INFO ] - reading Amber QC from work/051122/XXX_pat/XXX_tum.amber.qc
15:20:30 - [INFO ] - reading Amber BAFs from /work/051122/XXX/XXX_tum.amber.baf.tsv
15:20:31 - [INFO ] - reading Amber PCFs from /work/051122/XXX_pat/XXX_tum.amber.baf.pcf
15:20:32 - [INFO ] - average Amber tumor depth is 34 reads implying an ambiguous BAF of 0.575
15:20:32 - [INFO ] - reading Cobalt tumor segments from /work/051122/XXX_pat/XXX_tum.cobalt.ratio.pcf
15:20:32 - [INFO ] - reading Cobalt ratios from /work/051122/XXX_pat/XXX_tum.cobalt.ratio.tsv
15:20:37 - [INFO ] - reading Cobalt reference segments from /work/051122/XXX_pat/XXX_tum.cobalt.ratio.pcf
15:20:38 - [INFO ] - loading structural variants from /work/051122/XXX_pat/XXX_tum.gripss.filtered.vcf.gz
15:20:40 - [INFO ] - loaded somatic variants(11948 fitting=0) from /WGS/variants_allMatched/variants_v4.5/byPatient_VCFs/XXX_pat.vcf.gz
15:20:40 - [INFO ] - sample gender is female
15:20:40 - [INFO ] - applying segmentation
15:20:40 - [INFO ] - merging reference and tumor ratio break points
15:20:49 - [INFO ] - purple output directory: /work/051122/XXX_pat/
15:20:49 - [INFO ] - fitting purity
15:21:07 - [INFO ] - maxDiploidProportion(0.000) diploidCandidates(93) purityRange(0.960 - 1.000) hasTumor(true)
15:21:07 - [INFO ] - calculating copy number
15:21:07 - [INFO ] - loading recovery candidates from /work/051122/XXX_pat/XXX_tum.gripss.vcf.gz
15:21:07 - [INFO ] - reapplying segmentation with 1 recovered structural variants
15:21:07 - [INFO ] - merging reference and tumor ratio break points
15:21:14 - [INFO ] - recalculating copy number
15:21:14 - [INFO ] - modelling somatic peaks
15:21:16 - [INFO ] - enriching somatic variants
15:21:16 - [ERROR] - failed processing sample(XXX_tum): java.lang.NullPointerException
java.lang.NullPointerException
        at htsjdk.variant.variantcontext.GenotypeBuilder.copy(GenotypeBuilder.java:158)
        at htsjdk.variant.variantcontext.GenotypeBuilder.<init>(GenotypeBuilder.java:149)
        at com.hartwig.hmftools.purple.somatic.SomaticGenotypeEnrichment.processVariant(SomaticGenotypeEnrichment.java:73)
        at com.hartwig.hmftools.purple.somatic.SomaticVariantEnrichment.enrich(SomaticVariantEnrichment.java:50)
        at com.hartwig.hmftools.purple.somatic.SomaticStream.processAndWrite(SomaticStream.java:158)
        at com.hartwig.hmftools.purple.PurpleApplication.performFit(PurpleApplication.java:387)
        at com.hartwig.hmftools.purple.PurpleApplication.processSample(PurpleApplication.java:245)
        at com.hartwig.hmftools.purple.PurpleApplication.run(PurpleApplication.java:183)
        at com.hartwig.hmftools.purple.PurpleApplication.main(PurpleApplication.java:682)

The VCF files from GRIPSS contain the samples so I don't know what the problem could be. Some suggestions?

micoli98 commented 1 year ago

Same is starting to happen also in Gripss

lichennan123 commented 1 year ago

Any thoughts about why this happened and what you did to solve the problem? Thanks.

micoli98 commented 1 year ago

It was completely my fault. In the vcf files there where missing samples. I'm sorry for the inconvenience