hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
188 stars 58 forks source link

Stuck PAVE process #336

Closed toddajohnson closed 1 year ago

toddajohnson commented 1 year ago

Called germline and somatic variants using SAGE 3.2 and then annotated using PAVE 2.4. In the last set of samples, 2/27 germline VCFs did not finish and the java process was still running in top, but the log only reached the point shown below (sat running for overnight), whereas others (all similarly sized germline VCFs) finished immediately:

21:16:04 - [DEBUG] - pon file(/home/tjohnson/reference/HMF/38/dbs/PAVE/38/SageGermlinePon.JP.149x.38.tsv.gz) loaded 531548 entries for chromosome(chrY) 21:16:05 - [INFO ] - processed 100000 variants

Cutting apart the chrY VCF, it seems to get stuck at the last variant entry. Both stuck VCFs had the same last final variant shown below, and running PAVE with for both VCFs with chrY:56887837 removed finished.

chrY 56887837 . G A 108 PASS LPS=816238;LPS_RC=12;RC=GAAAT;RC_IDX=2;RC_LF=TAACTGGTGT;RC_NM=2;RC_REPC=3;RC_REPS=A;RC_RF=GATACCTCAT;REP_C=2;REP_S=GA;TIER=LOW_CONFIDENCE;TNC=AGA;AC=0;AN=0 GT:ABQ:AD:AF:DP:RABQ:RAD:RC_CNT:RC_IPC:RC_JIT:RC_QUAL:RDP:SB ./.:38:11,14:0.56:25:417,545:11,14:12,2,0,0,0,11,25:0:0,0,0:91,17,0,0,0,122,230:25:0.643 ./.:36:10,9:0.474:19:382,303:10,9:7,1,0,0,1,10,19:0:0,0,0:32,0,0,0,0,51,83:19:0.444

Any ideas why this variant would cause PAVE to get stuck?

charlesshale commented 1 year ago

I added that variant to a trimmed COLO829 VCF and it ran without any issues.

Pave processes each variant in turn and doesn't cache any data beyond that, so can't see it being a memory issue. Perhaps it is in an infinite loop, but it's surprising we haven't triggered such a condition.

Email me your full command and the full VCF if you can and I'll see if I can work it out.

COLO829v003T.sage.trim2.v38.pave.vcf.gz

toddajohnson commented 1 year ago

Hi Charles, Did you include the -mappability_bed option? I prepared a small VCF to send to you, but first looked at some of the annotation resources and realized that that variant (and another after it in the new unfiltered VCF) both were after the last chrY entry in the hmf_pipeline_resources.38_v5.31/variants/mappability_150.38.bed.gz file. I tried removing different optional parameters, and only if -mappability_bed is left in does that file get stuck, but it runs if it is removed. I just downloaded your test file, and it has the same behavior.

charlesshale commented 1 year ago

Understood - that's very helpful and I am fairly sure I can sort that out then.

charlesshale commented 1 year ago

Reproduced the issue and fixed in v1.4.1 - new JAR attached to:

https://github.com/hartwigmedical/hmftools/releases/tag/pave-v1.4