nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
https://nf-co.re/eager
MIT License
129 stars 78 forks source link

DSL1: Input file name collision when merging two already merged BAMs in `additional_library_merge` step. #1017

Closed TCLamnidis closed 8 months ago

TCLamnidis commented 11 months ago

Check Documentation

I have checked the following places for your error:

Description of the bug

In niche cases where multiple UDG treatments exist for a sample, and multiple libraries have each of these treatments, a file name collision kills the pipeline at the additional_library_merge step.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line: nextflow run ... (any input that requires merging of two already merged library-level BAMs during additional_library_merge step.
  2. See error:
Caused by:
  Process `additional_library_merge` input file name collision -- There are multiple input files for each of the following file names: MIS010_ss_libmerged.trimmed.bam, MIS010_ss_libmerged.trimmed.bam.bai

Expected behaviour

The BAMs initial libmerged should have unique names, to avoid such errors.

Log files

Have you provided the following extra information/files:

System

Nextflow Installation

Container engine

Additional context

jfy133 commented 11 months ago

In niche cases where multiple UDG treatments exist for a sample, and multiple libraries have each of these treatments, a file name collision kills the pipeline at the additional_library_merge step.

That's what confuses me... shouldn't they have been merged at the first post-dedup merging step? :thinking:

TCLamnidis commented 11 months ago

They are, that's the problem. as they end up with the same name. Could be an issue with the naming of the initial library merge step, OR the trimming step.

TCLamnidis commented 11 months ago
To give a better overview. Say we have a sample with 4 libraries with the following attributes: Sample Library UDG_Treatment Strandedness Lane
ABC001 A0101 half double 1
ABC001 A0102 half double 1
ABC001 B0101 none double 1
ABC001 B0102 none double 1

The BAMs of the first two libraries will be merged at the initial lib_merge, and be named ABC001_udghalf_libmerged.bam. Equally, the BAMs of the last two libraries will be merged at the initial lib_merge, and be named ABC001_udgnone_libmerged.bam. However, once they undergo bam trimming, the outputs lose their UDG attribute, and both become ABC001_libmerged.bam Once the two come together for the additional_library_merge step, the two input files share a name and the file collision pops up.