hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
191 stars 59 forks source link

GRIPSS PON filter error #116

Closed alhafidzhamdan closed 4 years ago

alhafidzhamdan commented 4 years ago

Hi there,

I tried running GRIPSS with this:

java $JVM_OPTS $JVM_TMP_DIR -cp $GRIPSS_JAR com.hartwig.hmftools.gripss.GripssApplicationKt \
   -ref_genome $REFERENCE \
   -breakend_pon $GRIDSS_PON/${BATCH}/gridss_pon_single_breakend.bed \
   -breakpoint_pon $GRIDSS_PON/${BATCH}/gridss_pon_breakpoint.bedpe \
   -breakpoint_hotspot $GRIPSS_FUSION \
   -tumor ${PATIENT_ID}T \
   -reference ${PATIENT_ID}N \
   -input_vcf $GRIDSS_RAW_VIRAL_ANNOTATED \
   -output_vcf $GRIDSS_PON_FILTERED

But i got this error:


10:17:57 - Config GripssFilterConfig(hardMinTumorQual=100, hardMaxNormalAbsoluteSupport=3, hardMaxNormalRelativeSupport=0.06, softMaxNormalRelativeSupport=0.03, minNormalCoverage=8, minTumorAF=0.005, maxShortStrandBias=0.95, minQualBreakEnd=1000, minQualBreakPoint=400, minQualRescueMobileElementInsertion=1000, maxHomLengthShortInversion=6, maxInexactHomLengthShortDel=5, minLength=32, polyGCRegion=2:32916190-32916630 +   .)
10:17:57 - Using E18T as tumor sample
10:17:57 - Using E18N as reference sample
10:17:57 - Reading hotspot file: /exports/igmm/eddie/Glioblastoma-WGS/resources/HMFTools-Resources/GRIPSS/KnownFusionPairs.hg38.bedpe
10:17:57 - Reading VCF file: /exports/igmm/eddie/Glioblastoma-WGS/WGS/variants/sv/gridss/results/E18.gridss.raw.vcf.gz
10:18:10 - Finished in 13 seconds
Exception in thread "main" java.lang.NumberFormatException: For input string: "43N"
    at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.base/java.lang.Integer.parseInt(Integer.java:652)
    at java.base/java.lang.Integer.parseInt(Integer.java:770)
    at com.hartwig.hmftools.bedpe.Breakend$Companion.fromBealn(Location.kt:39)
    at com.hartwig.hmftools.extensions.VariantContextExtensionsKt.potentialAlignmentLocations(VariantContextExtensions.kt:92)
    at com.hartwig.hmftools.extensions.VariantContextExtensionsKt.hasViralSequenceAlignment(VariantContextExtensions.kt:86)
    at com.hartwig.hmftools.gripss.StructuralVariantContext.normalSupportRelativeFilter(StructuralVariantContext.kt:398)
    at com.hartwig.hmftools.gripss.StructuralVariantContext.isHardFilter(StructuralVariantContext.kt:219)
    at com.hartwig.hmftools.gripss.GripssApplication.hardFilterAndRealign(GripssApplication.kt:164)
    at com.hartwig.hmftools.gripss.GripssApplication.run(GripssApplication.kt:88)
    at com.hartwig.hmftools.gripss.GripssApplicationKt.main(GripssApplication.kt:42)
Performing hard filter for PON-filtered VCF for E18
Exception in thread "main" htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: file:///exports/igmm/eddie/Glioblastoma-WGS/WGS/variants/sv/gridss/results/E18.gridss.pon.filtered.vcf.gz
    at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:97)
    at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:82)
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117)
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:81)
    at htsjdk.variant.vcf.VCFFileReader.<init>(VCFFileReader.java:148)
    at htsjdk.variant.vcf.VCFFileReader.<init>(VCFFileReader.java:98)
    at com.hartwig.hmftools.gripss.GripssHardFilterApplication.<init>(GripssHardFilterApplication.kt:54)
    at com.hartwig.hmftools.gripss.GripssHardFilterApplicationKt.main(GripssHardFilterApplication.kt:37)
Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
    at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:119)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
    at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:95)

I'm not sure why it says -> For input string: "43N" as there is no such sample in our cohort. This error only occurs in some samples but not in others. Any ideas?

jonbaber commented 4 years ago

Looks like an error I fixed today.

Please try again with the latest release.

DarioS commented 4 years ago

You probably have colons or asterisks in some of your contig names. I thought no one else was using a weird reference genome! 43N is not related to your sample IDs but you probably have a contig named like HLA-A*02:43N in your reference FASTA file.

alhafidzhamdan commented 4 years ago

Ah yes I have identified them in my fasta file. Thanks @DarioS. Moving forward, what do you suggest that we should do? Any way that @jonbaber could cater for this issue? or do i need to remove the contigs somehow, or use another ref genome (hg38)- the latter would be least favourable.

jonbaber commented 4 years ago

Have you tried using the latest release as I suggested above?

alhafidzhamdan commented 4 years ago

Yes @jonbaber - still the same issue. Also, I actually now realised that i have issues with viral and repeatmasker annotation stage of GRIDSS. Perhaps that is causing the issue. Waiting for them to help me troubleshoot that stage.

jonbaber commented 4 years ago

Ok. Once you have sorted out the GRIDSS annotations let me know if it still doesn't work on the latest version of GRIPSS and I can take a look. The easiest way for me to fix the issue is if you can provide a minimal VCF that reproduces the error. Looking at the log above it will involve an entry with "...43N..." in the BEALN info field.

DarioS commented 4 years ago

I had the same problem and it was fixed in GRIDSS version 2.9.4. Have you use the latest version? No need to wait for that advice.

alhafidzhamdan commented 4 years ago

I suspect for me it did not work as I used the optional --repeatmasker function as part of GRIDSS initial call. I will try without that option and run annotations for both repeatmasker and viral separately.

alhafidzhamdan commented 4 years ago

Hi GRIPSS has now worked- although seems to be OK without viral annotation. I used GRIDSS 2.10