nf-core / atacseq

ATAC-seq peak-calling and QC analysis pipeline
https://nf-co.re/atacseq
MIT License
187 stars 117 forks source link

CONTROL description needs clarifications #337

Closed mdozmorov closed 6 months ago

mdozmorov commented 1 year ago

Description of feature

The "CONTROL" option description in the "samplesheet input" section needs to be improved.

  1. What exactly the control option does? From the description, it appears "CONTROL" samples may be used as input, but "input" is different from control. This is especially confusing in the provided examples, where "TREATMENT" samples are designated as "CONTROL".
  2. "Example sheets without controls and with controls" - links are broken.
  3. It appears the "control" and "control_replicate" columns are only recognized using "--with_control true" parameter, which is not clear from the documentation.
  4. The pipeline breaks if creating samplesheet following the example. I created the spreadsheet as:
sample fastq_1 fastq_2 replicate control control_replicate
HSATACtr 00_raw/HSATACtr1_S67_R1_001.fastq.gz 00_raw/HSATACtr1_S67_R2_001.fastq.gz 1 CONTROL 1
HSATACtr 00_raw/HSATACtr2_S68_R1_001.fastq.gz 00_raw/HSATACtr2_S68_R2_001.fastq.gz 2 CONTROL 2
HSATACun 00_raw/HSATACun1_S63_R1_001.fastq.gz 00_raw/HSATACun1_S63_R2_001.fastq.gz 1
HSATACun 00_raw/HSATACun2_S64_R1_001.fastq.gz 00_raw/HSATACun2_S64_R2_001.fastq.gz 2

The pipeline errors with "ERROR: Please check samplesheet -> Control identifier and replicate has to match a provided sample identifier and replicate!"

Correcting the spreadsheet in the "control" column as

sample fastq_1 fastq_2 replicate control control_replicate
HSATACtr 00_raw/HSATACtr1_S67_R1_001.fastq.gz 00_raw/HSATACtr1_S67_R2_001.fastq.gz 1 HSATACtr 1
HSATACtr 00_raw/HSATACtr2_S68_R1_001.fastq.gz 00_raw/HSATACtr2_S68_R2_001.fastq.gz 2 HSATACtr 2
HSATACun 00_raw/HSATACun1_S63_R1_001.fastq.gz 00_raw/HSATACun1_S63_R2_001.fastq.gz 1
HSATACun 00_raw/HSATACun2_S64_R1_001.fastq.gz 00_raw/HSATACun2_S64_R2_001.fastq.gz 2

works. But this contradicts the documentation.

As of now, it feels safer to run the pipeline without "controls" because it is unclear what are the consequences.

rb56 commented 11 months ago

Hello, I'm trying to run this and have the same problem. Currently running it in without controls parameter, could you find a fix around this?

bjlang commented 6 months ago

358 should clarify this