The "CONTROL" option description in the "samplesheet input" section needs to be improved.
What exactly the control option does? From the description, it appears "CONTROL" samples may be used as input, but "input" is different from control. This is especially confusing in the provided examples, where "TREATMENT" samples are designated as "CONTROL".
It appears the "control" and "control_replicate" columns are only recognized using "--with_control true" parameter, which is not clear from the documentation.
The pipeline breaks if creating samplesheet following the example. I created the spreadsheet as:
sample
fastq_1
fastq_2
replicate
control
control_replicate
HSATACtr
00_raw/HSATACtr1_S67_R1_001.fastq.gz
00_raw/HSATACtr1_S67_R2_001.fastq.gz
1
CONTROL
1
HSATACtr
00_raw/HSATACtr2_S68_R1_001.fastq.gz
00_raw/HSATACtr2_S68_R2_001.fastq.gz
2
CONTROL
2
HSATACun
00_raw/HSATACun1_S63_R1_001.fastq.gz
00_raw/HSATACun1_S63_R2_001.fastq.gz
1
HSATACun
00_raw/HSATACun2_S64_R1_001.fastq.gz
00_raw/HSATACun2_S64_R2_001.fastq.gz
2
The pipeline errors with "ERROR: Please check samplesheet -> Control identifier and replicate has to match a provided sample identifier and replicate!"
Correcting the spreadsheet in the "control" column as
sample
fastq_1
fastq_2
replicate
control
control_replicate
HSATACtr
00_raw/HSATACtr1_S67_R1_001.fastq.gz
00_raw/HSATACtr1_S67_R2_001.fastq.gz
1
HSATACtr
1
HSATACtr
00_raw/HSATACtr2_S68_R1_001.fastq.gz
00_raw/HSATACtr2_S68_R2_001.fastq.gz
2
HSATACtr
2
HSATACun
00_raw/HSATACun1_S63_R1_001.fastq.gz
00_raw/HSATACun1_S63_R2_001.fastq.gz
1
HSATACun
00_raw/HSATACun2_S64_R1_001.fastq.gz
00_raw/HSATACun2_S64_R2_001.fastq.gz
2
works. But this contradicts the documentation.
As of now, it feels safer to run the pipeline without "controls" because it is unclear what are the consequences.
Description of feature
The "CONTROL" option description in the "samplesheet input" section needs to be improved.
The pipeline errors with "ERROR: Please check samplesheet -> Control identifier and replicate has to match a provided sample identifier and replicate!"
Correcting the spreadsheet in the "control" column as
works. But this contradicts the documentation.
As of now, it feels safer to run the pipeline without "controls" because it is unclear what are the consequences.