Add the ability to pass in TSV samplesheets that can use multiple read files, and search through directories to get the reads to be analyzed.
Added Features
Additional parameters
--samplesheet
type: String
default: null
The path to a TSV document describing sample layout
More Info
Context
This will allow for using the raw output of ONT runs, concatenating multiple sequence runs together, using more meaningful sample names, and much more.
Possible implementation
Sample sheets should have four mandatory columns:
Sample name
Path
This can be an absolute path (starting with /) or a path relative to the --input directory
Type
Can be one of
directory
Combine all reads files found in the directory of 2 and its subdirectories
directory-shallow
Combine all reads files found in the directory of 2 with no subdirectories
file
2 is a single file of reads
Pairity
Can be one of
none
These reads are not paired-end
paired
Only applies when 3 is either directory or directory-shallow
Assumes that the files matching glob *_{R1,1}.* are forward reads and files matching glob *_{R2,2}.* are reverse reads
forward
Only applies when 3 is file
2 is a file of forward reads
reverse
Only applies when 3 is file
2 is a file of reverse reads
At this point, it is unreasonable to allow paired-end reads and single-end reads to be analyzed together. However, individual files and directories should be able to be mixed.
The first line (header) is optional, and should be marked with a pound sign (#).
Example
#Sample
Path
Type
Pairity
pig-serum
/data/vdl/pig/21
directory
paired
pig-feces
/data/field/PIG_S001_R1_L001.fastq.gz
file
forward
pig-feces
/data/field/PIG_S001_R2_L001.fastq.gz
file
reverse
pig-feces
/data/field2/pig
directory-shallow
paired
This can be achieved using a mix of Nextflow channel operators and shell scripts.
Summary
Add the ability to pass in TSV samplesheets that can use multiple read files, and search through directories to get the reads to be analyzed.
Added Features
Additional parameters
--samplesheet
String
null
More Info
Context
This will allow for using the raw output of ONT runs, concatenating multiple sequence runs together, using more meaningful sample names, and much more.
Possible implementation
Sample sheets should have four mandatory columns:
/
) or a path relative to the--input
directorydirectory
ordirectory-shallow
*_{R1,1}.*
are forward reads and files matching glob*_{R2,2}.*
are reverse readsfile
file
At this point, it is unreasonable to allow paired-end reads and single-end reads to be analyzed together. However, individual files and directories should be able to be mixed.
The first line (header) is optional, and should be marked with a pound sign (
#
).Example
This can be achieved using a mix of Nextflow channel operators and shell scripts.