nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
https://nf-co.re/eager
MIT License
128 stars 78 forks source link

bedtools coverage fails when gff file and genome file are not sorted the same way #1037

Closed IdoBar closed 4 months ago

IdoBar commented 6 months ago

Check Documentation

I have checked the following places for your error:

Description of the bug

Bedtools coverage fails when there's a mismatch between the sorting order of the genome and the gff files.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Use the following flags in an Eager run:

    --fasta "https://ftp.ensembl.org/pub/grch37/current/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna_sm.primary_assembly.fa.gz" \
    --anno_file "https://ftp.ensembl.org/pub/grch37/current/gff3/homo_sapiens/Homo_sapiens.GRCh37.87.gff3.gz" \
    --run_bedtools_coverage
  2. See error: Please provide your error message

Error executing process > 'bedtools (AB_libmerged)'

Caused by:
  Process `bedtools (AB_libmerged)` terminated with an error exit status (1)

Command executed:

  ## Create genome file from bam header
  samtools view -H AB_udghalf_libmerged_rmdup.bam | grep '@SQ' | sed 's#@SQ SN:\|LN:##g' > genome.txt

  ##  Run bedtools
  bedtools coverage -nonamecheck -g genome.txt -sorted -a Homo_sapiens.GRCh37.87.gff3 -b AB_udghalf_libmerged_rmdup.bam | pigz -p 1 > "AB_udghalf_libmerged_rmdup".breadth.gz
  bedtools coverage -nonamecheck -g genome.txt -sorted -a Homo_sapiens.GRCh37.87.gff3 -b AB_udghalf_libmerged_rmdup.bam -mean | pigz -p 1 > "AB_udghalf_libmerged_rmdup".depth.gz

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error: Sorted input specified, but the file Homo_sapiens.GRCh37.87.gff3 has the following record with a different sort order than the genomeFile genome.txt
  GL000192.1    GRCh37  supercontig 1   547496  .   .   .   ID=supercontig:GL000192.1;Alias=NT_167207.1

Expected behaviour

I expect bedtools coverage to complete successfully. I was able to overcome this by removing the -sorted flag and letting bedtools sort the files when running the command.

Log files

Have you provided the following extra information/files:

System

Nextflow Installation

Container engine

Additional context

Fixed the problem by removing the -sorted flag from the command, see #1036 A similar change should be done to the DLS2 version of the bedtools module in line 24 of modules/nf-core/bedtools/main.nf, but I haven't tested it.

Thanks, Ido

TCLamnidis commented 4 months ago

Fixed by #1052