Closed malachig closed 1 year ago
The perl command in question which could be used to pull the specific input files for testing:
/usr/bin/perl - /cromwell_root/griffith-lab-test-malachi/input_data/mgriffit/2023-03-14/gmsroot/instrument_data/imported/39f191eaf6e84442bc189b1a9ff1cdd6/CATGTACCAC-TACCACGGCT_S5_L002_R1_001.fastq.gz /cromwell_root/griffith-lab-test-malachi/input_data/mgriffit/2023-03-14/gmsroot/instrument_data/imported/39f191eaf6e84442bc189b1a9ff1cdd6/CATGTACCAC-TACCACGGCT_S5_L002_R2_001.fastq.gz > "tumor_dna_unaligned_metrics1.txt"
i.e. the input fastqs are here:
gs://griffith-lab-test-malachi/input_data/mgriffit/2023-03-14/gmsroot/instrument_data/imported/39f191eaf6e84442bc189b1a9ff1cdd6/
With increased memory (8G) and increased disk (2x) this it succeeded. Based on monitoring of the job we suspect it was the extra memory that was needed.
This has been working for several data sets now.
In a recent test run I got four failures (first attempt and all three retries) on this: generateFdaMetrics -> call-unaligned_tumor_dna_fda_metrics -> generateFdaMetricsForBamOrFastqs ->
call-unalignedSeqFdaStats
In all four failures the
stdout
andstderr
files are empty. The log file shows that localizing the files worked and then cromwell started to run/cromwell_root/script
but nothing after that.I logged into an instance and it seemed to running the Perl code, using most of the disk space and memory usage was growing. Maybe it runs out of either Mem or Disk on step for larger input files. We have had plenty of successes for this step on other inputs.
I think the Perl code in question is this stuff:
This WDL I think? https://github.com/wustl-oncology/analysis-wdls/blob/main/definitions/tools/unaligned_seq_fda_stats.wdl
I wonder if we might bump the memory a bit there? Conditionally? Or perhaps disk space a bit more?
I also wonder if it would be possible to add some kind of progress logging output so that we could see how this long running task is going and also see where it gets before failing in situations like this.
I wish I had the resource monitoring script turned on for this test. That might have told us what exactly is going on here.