uclahs-cds / pipeline-align-DNA

Nextflow pipeline to align paired DNA FASTQs and sort, mark duplicates, and index resulting alignment
https://uclahs-cds.github.io/pipeline-align-DNA/
GNU General Public License v2.0
4 stars 1 forks source link

No error during align-DNA failure #252

Open alkaZeltser opened 1 year ago

alkaZeltser commented 1 year ago

Describe the issue I am running some new samples through the metapipeline and trying to test various partitions (F72/F32/F16) to find the minimum requirement for my dataset. I found no issues when running on F72, however the other two partitions result in a failure during the align-DNA process. The pipeline stops and errors out, but no descriptive error message from BWA-MEM is returned, so trouble-shooting is difficult. The failure occurred about 5 hours into F32 alignment and 12 hours into F16 alignment. No completed BAMs were returned.

The test sample I'm using is from the recently registered /hot/data/PRAD/PRAD0000068 It is a single germline WGS sample (not tumor-normal pair). More info here: https://github.com/uclahs-cds/dataset-register-file/pull/116

From successfully completed F72 test runs, I know that the aligned BAM of this sample is 110G - quite large. I suspect this is a resource issue, but would be nice to get a definitive error message from the aligner on why it stops.

Error messages in logs:

executor >  local (2), slurm (1)
[c2/74efbe] process > create_input_csv_metapipeli... [100%] 1 of 1 ✔
[24/73ecf1] process > create_config_json             [100%] 1 of 1 ✔
[54/a283d8] process > call_metapipeline_DNA (1)      [100%] 1 of 1, failed: 1 ✔
[54/a283d8] NOTE: Process `call_metapipeline_DNA (1)` terminated with an error exit status (1) -- Error is ignored
Dec-05 19:39:38.224 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'align_DNA:call_align_DNA (1)'

Caused by:
  Process `align_DNA:call_align_DNA (1)` terminated with an error exit status (1)

Command executed:

  nextflow run         /hot/software/pipeline/metapipeline-DNA/Nextflow/release/3.0.0/module/align_DNA/../../external/pipeline-align-DNA/main.nf         --sample_name EZPRLPUV000001-N001-B01-F         --aligner BWA-MEM2          --enable_spark true --mark_duplicates true --reference_fasta_bwa /hot/ref/tool-specific-input/BWA-MEM2-2.2.1/GRCh38-BI-20160721/index/genome.fa         --output_dir $(pwd)         --work_dir /scratch         --input_csv EZPRLPUV000001-N001-B01-F_align_DNA_input.csv         -c /hot/software/pipeline/metapipeline-DNA/Nextflow/release/3.0.0/module/align_DNA/default.config

Command exit status:
  1

Command output: and Command error: are empty lines.

To Reproduce

  1. Run
    python3 /hot/user/nzeltser/tool-submit-nf/submit_nextflow_pipeline.py \
    --nextflow_script /hot/software/pipeline/metapipeline-DNA/Nextflow/release/3.0.0/main.nf \
    --nextflow_config /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_meta-pipeline_F32.config \
    --pipeline_run_name F32-TEST \
    --partition_type F2 \
    --nextflow_yaml /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_one_sample_test.yaml
  2. Wait 5 hours.

Expected behavior I don't actually expect this size of a sample to complete on an F16 node, maaaybe an F32, but I do expect an error message telling me why it failed.

yashpatel6 commented 1 year ago

This issue should really be in the align-DNA repository; this isn't an issue with the metapipeline itself.

alkaZeltser commented 1 year ago

This issue should really be in the align-DNA repository; this isn't an issue with the metapipeline itself.

True, should I copy it over and remove this one?

yashpatel6 commented 1 year ago

You should be able to Transfer issue with the option on the column to the right so you won't have to copy-and-paste/delete issues anywhere.

jarbet commented 1 year ago

@alkaZeltser: can you try directly running align-DNA instead of meta-pipeline and see if it gives a more informative error?

Also:

alkaZeltser commented 1 year ago

@alkaZeltser: can you try directly running align-DNA instead of meta-pipeline and see if it gives a more informative error?

I could.. but Paul told me to document the issue and move on :D But I support anyone else's attempts if they so choose.

  • I see you are using align-DNA v8.1.0, can you try 9.0.0?

I'm using what the metapipeline is pointing to, which is uclahs-cds/pipeline-align-DNA: 8.1.0

  • Can you send me the location of an align-DNA input.csv file for the fastq files you are testing?

Here is the csv file generated by the metapipeline for my test sample:

/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/00/922554427af447b234b14e51ebce1f/EZPRLPUV000001-N001-B01-F_metapipeline_DNA_input.csv

jarbet commented 1 year ago

I'm getting the following error when I test on F16 or F32 using v9.0.0 or the current branch:


Error executing process > 'align_DNA_BWA_MEM2_workflow:run_MarkDuplicatesSpark_GATK'

Caused by:
  Process `align_DNA_BWA_MEM2_workflow:run_MarkDuplicatesSpark_GATK` input file name collision -- There are multiple input files for each of the following file names: BWA-MEM2-2.2.1_0000068_test_382644260-L002-sorted.bam, BWA-MEM2-2.2.1_0000068_test_382644260-L001-sorted.bam, BWA-MEM2-2.2.1_0000068_test_382644260-L003-sorted.bam, BWA-MEM2-2.2.1_0000068_test_382644260-L004-sorted.bam

When checking the nextflow html report, there are no "failed" tasks. However, there is 1 "aborted" task for pipeval's remove_intermediate_files.

@yashpatel6 : did something change in pipeval recently that could be causing this error?

image

nkwang24 commented 1 year ago

I believe this is related to #229. When it fails during Spark, it seems like Spark isn't able to return the corresponding error message back to the main process resulting in no error message in the log.