wustl-oncology / analysis-wdls

Scalable genomic analysis pipelines, written in WDL
MIT License
5 stars 11 forks source link

Out of resources in FDA unaligned metrics step #96

Closed malachig closed 1 year ago

malachig commented 1 year ago

In a recent test run I got four failures (first attempt and all three retries) on this: generateFdaMetrics -> call-unaligned_tumor_dna_fda_metrics -> generateFdaMetricsForBamOrFastqs -> call-unalignedSeqFdaStats

In all four failures the stdout and stderr files are empty. The log file shows that localizing the files worked and then cromwell started to run /cromwell_root/script but nothing after that.

I logged into an instance and it seemed to running the Perl code, using most of the disk space and memory usage was growing. Maybe it runs out of either Mem or Disk on step for larger input files. We have had plenty of successes for this step on other inputs.

I think the Perl code in question is this stuff:

        #!/usr/bin/perl

        use strict;
        use warnings;
        use Carp;

        use FileHandle;
        use File::Basename;
        use File::Spec::Functions;
        use IO::Uncompress::AnyUncompress;

        # sets global variables with the default
        use vars qw/$filter $offset $samtools/;

        # defines the minimum base call quality score to filter
        # inclusive, for example, Bases with >= Q30
        $filter = 30;

        # defines the offset to calculate the base call quality score from the Phred +33 encoded by ASCII characters
        # For example, 33 = 63.06707094 - 30.06707094 (see below)
        $offset = 33;

        # specifies the program paths in docker
        $samtools = "/opt/samtools/bin/samtools";

        # main subroutine
        Main();

        # program exits here
        exit 0;

        sub Main {
            # for the paths of input files
            my @paths = @ARGV if @ARGV > 0;
            croak "input file path required" unless @paths > 0;

...

This WDL I think? https://github.com/wustl-oncology/analysis-wdls/blob/main/definitions/tools/unaligned_seq_fda_stats.wdl

I wonder if we might bump the memory a bit there? Conditionally? Or perhaps disk space a bit more?

I also wonder if it would be possible to add some kind of progress logging output so that we could see how this long running task is going and also see where it gets before failing in situations like this.

I wish I had the resource monitoring script turned on for this test. That might have told us what exactly is going on here.

malachig commented 1 year ago

The perl command in question which could be used to pull the specific input files for testing:

    /usr/bin/perl - /cromwell_root/griffith-lab-test-malachi/input_data/mgriffit/2023-03-14/gmsroot/instrument_data/imported/39f191eaf6e84442bc189b1a9ff1cdd6/CATGTACCAC-TACCACGGCT_S5_L002_R1_001.fastq.gz /cromwell_root/griffith-lab-test-malachi/input_data/mgriffit/2023-03-14/gmsroot/instrument_data/imported/39f191eaf6e84442bc189b1a9ff1cdd6/CATGTACCAC-TACCACGGCT_S5_L002_R2_001.fastq.gz > "tumor_dna_unaligned_metrics1.txt"

i.e. the input fastqs are here:

gs://griffith-lab-test-malachi/input_data/mgriffit/2023-03-14/gmsroot/instrument_data/imported/39f191eaf6e84442bc189b1a9ff1cdd6/

malachig commented 1 year ago

With increased memory (8G) and increased disk (2x) this it succeeded. Based on monitoring of the job we suspect it was the extra memory that was needed.

malachig commented 1 year ago

This has been working for several data sets now.