nf-cmgg / structural

A bioinformatics best-practice analysis pipeline for calling structural variants (SVs), copy number variants (CNVs) and repeat region expansions (RREs) from short DNA reads
https://nf-cmgg.github.io/structural/
MIT License
18 stars 3 forks source link

sex unknown (ratio in gray area) #95

Open mvheetve opened 1 month ago

mvheetve commented 1 month ago

Description of the bug

I came across this:

Jul-26 15:36:49.789 [TaskFinalizer-15] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCMGG_STRUCTURAL:STRUCTURAL:BAM_REPEAT_ESTIMATION_EXPANSIONHUNTER:EXPANSIONHUNTER (D2004042)'

Caused by:
  Process `NFCMGG_STRUCTURAL:STRUCTURAL:BAM_REPEAT_ESTIMATION_EXPANSIONHUNTER:EXPANSIONHUNTER (D2004042)` terminated with an error exit status (2)

Command executed:

  ExpansionHunter \
      --sex unknown (ratio in gray area) \
      --reads D2004042.cram \
      --output-prefix D2004042 \
      --reference GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna \
      --variant-catalog variant_catalog.json

  bgzip --threads 12  D2004042.vcf
  bgzip --threads 12  D2004042.json

  cat <<-END_VERSIONS > versions.yml
  "NFCMGG_STRUCTURAL:STRUCTURAL:BAM_REPEAT_ESTIMATION_EXPANSIONHUNTER:EXPANSIONHUNTER":
      expansionhunter: $( echo $(ExpansionHunter --version 2>&1) | head -1 | sed 's/^.*ExpansionHunter v//')
      bgzip: $(echo $(bgzip -h 2>&1) | sed 's/^.*Version: //;s/Usage:.*//')
  END_VERSIONS

Command exit status:
  2

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  .command.sh: line 3: syntax error near unexpected token `('

Work dir:
  /kyukon/scratch/gent/vo/000/gvo00082/vsc43079/structural_WGS/NVQ_997/structural/work/72/01840b3cd2f3bfe1508aa72626f3d6

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

This is WGS data with very low coverage. Guess is that the sex could not be inferred because of low coverage.

Maybe an idea to include a check for the ability of sex inference. If the check is not passed, you could halt the pipeline and ask to specify in the samplesheet, but maybe a more elegant way is to issue a warning and then run both male and female calculations can be done, leaving the user the chance to assess the data and delete whatever doesn't fit. I see value in both approaches, what do you think?

Command used and terminal output

No response

Relevant files

No response

System information

No response