nf-core / raredisease

Call and score variants from WGS/WES of rare disease patients.
https://nf-co.re/raredisease
MIT License
90 stars 34 forks source link

Help with figuring out if error is pipeline or HPC related #612

Open hrydbeck opened 2 months ago

hrydbeck commented 2 months ago

Description of the bug

I am trying to run the pipeline on the HPC PDC-Dardel and get error messages that I am not able to make sense of. I post it here in case you have any tips on how to figure out what is not working, or if I should use another forum for this type of question or maybe just accept that the pipeline can not be run at HPC-Dardel due to shortcomings of the HPC.

This should include the errors:

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  INFO:    fuse2fs not found, will not be able to mount EXT3 filesystems
  INFO:    gocryptfs not found, will not be able to use gocryptfs
  WARNING: Skipping mount /cfs/klemming/pdc/software/dardel/23.12/eb/software/apptainer/1.3.0-cpeGNU-23.12/var/apptainer/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
  Looking to launch executable "/usr/local/bin/bwa-mem2.avx2", simd = .avx2
  Launching executable "/usr/local/bin/bwa-mem2.avx2"
  [bwa_index] Pack FASTA... 14.43 sec
  * Entering FMI_search

Work dir:
  /cfs/klemming/projects/supr/naiss2024-23-348/nfc_rd/work/81/2877ad8ba08e92e6d91dd599c5d401

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
Sep-13 13:54:30.754 [Actor Thread 241] WARN  nextflow.processor.TaskProcessor - Input tuple does not match input set cardinality declared by process `NFCORE_RAREDISEASE:RAREDISEASE:SMNCOPYNUMBERCALLER` -- offending value: [father:, mother:, probands:[hugelymodelbat], upd_children:[], id:justhusky]
Sep-13 13:54:30.757 [Task monitor] INFO  nextflow.Session - Execution cancelled -- Finishing pending tasks before exit
Sep-13 13:54:30.759 [Actor Thread 220] WARN  nextflow.processor.TaskProcessor - Input tuple does not match input set cardinality declared by process `NFCORE_RAREDISEASE:RAREDISEASE:CALL_STRUCTURAL_VARIANTS:CALL_SV_TIDDIT:SVDB_MERGE_TIDDIT` -- offending value: [father:, mother:, probands:[hugelymodelbat], upd_children:[], id:justhusky]
Sep-13 13:54:30.771 [Task monitor] ERROR nextflow.Nextflow - Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
Sep-13 13:54:30.772 [Actor Thread 241] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_RAREDISEASE:RAREDISEASE:SMNCOPYNUMBERCALLER (1); work-dir=null
  error [nextflow.exception.ProcessUnrecoverableException]: Path value cannot be empty
Sep-13 13:54:30.772 [Actor Thread 220] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_RAREDISEASE:RAREDISEASE:CALL_STRUCTURAL_VARIANTS:CALL_SV_TIDDIT:SVDB_MERGE_TIDDIT (1); work-dir=null
  error [nextflow.exception.ProcessUnrecoverableException]: Path value cannot be empty
Sep-13 13:54:30.727 [Actor Thread 230] ERROR nextflow.extension.OperatorImpl - @unknown

On Slack-nf-core-HPC: @pontus wrote:

Could be still running but momentarily disappeared from squeue. Or the job finishing but the file system being broken and it couldn't write results/output status. Or the node crashing. Or the job timing out and not succeeding to write any exit code. Or something else. Unless it times out and it manages to make the node stuck, rerunning with -resume should be helpful.

Command used and terminal output

nextflow run nf-core/raredisease -r dev -profile test_full,singularity,pdc_kth --outdir test_full_rev_patch_out --project naiss2024-22-481 --skip_snv_annotation --skip_vep_filter --skip_sv_annotation --skip_mt_annotation --skip_germlinecnvcaller --skip_me_calling --skip_me_annotation -resume

Relevant files

nextflow.log

System information

version 23.10.1 HPC slurm Singularity dev

pontus commented 2 months ago

Likely, the nf-core slack is much more suitable for this kind of thing. Here it seems bwa-mem2 starts (but fails? Or just doesn't exit? Not enough information here). It's quite possible you need to dive into the work directory to figure out what happens, possibly rerunning to see.

Your later log lines are possibly unrelated, but there are some notions that would make me wonder whatever the pipeline gets the inputs it expects.

(It's definitely possible to run pipelines on Dardel, including I believe, raredisease. Whatever it makes sense to run it for animals or ancient populations, I can't have an opinion on).