nf-core / oncoanalyser

A comprehensive cancer DNA/RNA analysis and reporting pipeline
https://nf-co.re/oncoanalyser
MIT License
39 stars 6 forks source link

Markdup doesn't accept unpaired reads aligned with minimap2 #55

Closed charlenelawdes closed 2 months ago

charlenelawdes commented 3 months ago

Description of the bug

I have Nanopore reads aligned to the GRCh38_hmf reference using minimap2, as it's the recommended tool to use for Nanopore reads. I get an error message at the process NFCORE_ONCOANALYSER:WGTS:READ_PROCESSING:MARKDUPS

Command used and terminal output

nextflow run nf-core/oncoanalyser \
  -resume \
  -r 0.4.6 \
  -profile singularity \
  --mode wgts \
  --genome GRCh38_hmf \
  --input $SMPSHEET \
  --outdir $OUT \
  -c $APPTAINER_CONFIG

Relevant files

Jun.-05 11:43:16.852 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 30075009; id: 4; name: NFCORE_ONCOANALYSER:WGTS:READ_PROCESSING:MARKDUPS (PPT45_SIGN1048); status: COMPLETED; exit: 1; error: -; workDir: /PPT45/oncoanalyser_test/scripts/work/d0/6d687313e1fcffb2700d85157ebf1f started: 1717602196849; exited: 2024-06-05T15:42:45Z; ]
Jun.-05 11:43:16.874 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_ONCOANALYSER:WGTS:READ_PROCESSING:MARKDUPS (PPT45_SIGN1048); work-dir=/PPT45/oncoanalyser_test/scripts/work/d0/6d687313e1fcffb2700d85157ebf1f
  error [nextflow.exception.ProcessFailedException]: Process `NFCORE_ONCOANALYSER:WGTS:READ_PROCESSING:MARKDUPS (PPT45_SIGN1048)` terminated with an error exit status (1)
Jun.-05 11:43:16.905 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_ONCOANALYSER:WGTS:READ_PROCESSING:MARKDUPS (PPT45_SIGN1048)'

Caused by:
  Process `NFCORE_ONCOANALYSER:WGTS:READ_PROCESSING:MARKDUPS (PPT45_SIGN1048)` terminated with an error exit status (1)

Command executed:

  markdups \
      -Xmx36721970381 \
      \
      -samtools $(which samtools) \
      -sambamba $(which sambamba) \
      \
      -sample SIGN1048 \
      -input_bam SIGN1048_aligned_sorted_RG.bam \
      \
      -form_consensus \
       \
      \
      -unmap_regions unmap_regions.38.tsv \
      -ref_genome GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
      -ref_genome_version 38 \
      \
      -write_stats \
      -threads 6 \
      \
      -output_bam SIGN1048.markdups.bam

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_ONCOANALYSER:WGTS:READ_PROCESSING:MARKDUPS":
      markdups: $(markdups -version | awk '{ print $NF }')
      sambamba: $(sambamba --version 2>&1 | egrep '^sambamba' | head -n 1 | awk '{ print $NF }')
      samtools: $(samtools --version 2>&1 | egrep '^samtools\s' | head -n 1 | sed 's/^.* //')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  /usr/local/bin/markdups: line 6: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory
  15:42:44.608 [main] [INFO ] MarkDups version 1.1.5
  15:42:44.772 [main] [INFO ] output(./)
  15:42:45.038 [main] [INFO ] loaded 80309 unmapping regions from unmap_regions.38.tsv
  15:42:45.038 [main] [INFO ] duplicate logic: consensus
  15:42:45.039 [main] [INFO ] sample(SIGN1048) starting mark duplicates
  15:42:45.536 [Thread-0] [ERROR] read(id(c68f9a3b-05e1-438f-9daa-bf6ec949f9b0) coords(chr1:10001-41941) cigar(3302S24M1D83M1I28M2I10M1D30M1I41M1I19M1I9M2D6M1D29M1I12M1I5M3I5M1I3M1I18M5I62M1D6M1D4M1D25M1I6M1I31M1I44M2D116M7I1M1I71M26I104M2I5M3D8M4I4M4I6M8I34M105I106M3I5M3I39M1D7M1I43M1I78M1I1050M1D412M2D93M1D57M1I1M1I328M1D46M1I523M2D281M1I18M1D207M1D91M12D522M1I424M2D326M2I34M1I326M1I52M2I15M1D3M3D1M1D122M3D76M2I133M1I200M1D236M1D718M1D22M2I130M1D273M3I2M2I154M1D376M3D85M1I67M1I176M2D1511M2I2M1D466M1I7M2I282M1D5M1D3M1I120M1D220M1D11M2D374M2I117M1I118M1I239M1D2M1I3M1D1508M1I111M1I510M2D545M3D510M1I58M2D438M1D121M1I2M1D196M1I46M1I26M1I177M1I230M1I769M1D62M1D2M1I1015M3D180M3I204M1I506M1D172M2I5M1D241M5D3M3D543M1I161M1D250M2D1227M1I9M1D8M1I40M4D1362M1D9M1D575M2I2M1D1443M2D335M2D11M1D67M2I3M1I378M1I515M1I11M2D11M1D1204M1I10M1D141M2I44M1I44M3I228M1I622M1I1286M1I9M2D157M2I87M1D1214M9S) mate(*:0) flags(0)) exception: java.lang.IllegalStateException: Inappropriate call if not paired read
  java.lang.IllegalStateException: Inappropriate call if not paired read
    at htsjdk.samtools.SAMRecord.requireReadPaired(SAMRecord.java:892)
    at htsjdk.samtools.SAMRecord.getMateUnmappedFlag(SAMRecord.java:919)
    at com.hartwig.hmftools.markdups.ReadPositionsCache.processRead(ReadPositionsCache.java:105)
    at com.hartwig.hmftools.markdups.PartitionReader.processSamRecord(PartitionReader.java:207)
    at com.hartwig.hmftools.markdups.BamReader.sliceRegion(BamReader.java:58)
    at com.hartwig.hmftools.markdups.PartitionReader.processRegion(PartitionReader.java:123)
    at com.hartwig.hmftools.markdups.PartitionThread.run(PartitionThread.java:61)

Work dir:
  /PPT45/oncoanalyser_test/scripts/work/d0/6d687313e1fcffb2700d85157ebf1f

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
Jun.-05 11:43:16.917 [Task monitor] INFO  nextflow.Session - Execution cancelled -- Finishing pending tasks before exit`

System information

Nextflow version : 23.10.0 Hardware: HPC Executor: slurm Container engine: Apptainer OS: CentOS nf-core/oncoanalyser version: 0.4.6

scwatts commented 3 months ago

Thanks for reporting this. There have been several bug fixes in the latest MarkDups release (v1.1.7). Would you be able to first see whether this data works with that release?

To do that I'd recommend navigating to the MarkDups work directory from your oncoanalyser analysis then download and run MarkDups 1.1.7, something like the following should work:

cd /PPT45/oncoanalyser_test/scripts/work/d0/6d687313e1fcffb2700d85157ebf1f/

wget https://github.com/hartwigmedical/hmftools/releases/download/mark-dups-v1.1.7/mark-dups_v1.1.7.jar

java -Xmx36721970381 -jar mark-dups_v1.1.7.jar \
  -samtools $(which samtools) \
  -sambamba $(which sambamba) \
  -sample SIGN1048 \
  -input_bam SIGN1048_aligned_sorted_RG.bam \
  -form_consensus \
  -unmap_regions unmap_regions.38.tsv \
  -ref_genome GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
  -ref_genome_version 38 \
  -write_stats \
  -threads 6 \
  -output_bam SIGN1048.markdups.bam
scwatts commented 2 months ago

I'll close this issue for now - if you'd like to continue discussing/debugging, please reopen!