ksumngs / yavsap

Yet Another Viral Subspecies Analysis Pipeline
https://ksumngs.github.io/yavsap
MIT License
2 stars 1 forks source link

[Feature]: Allow using samplesheets #21

Closed MillironX closed 2 years ago

MillironX commented 2 years ago

Summary

Add the ability to pass in TSV samplesheets that can use multiple read files, and search through directories to get the reads to be analyzed.

Added Features

Additional parameters

More Info

Context

This will allow for using the raw output of ONT runs, concatenating multiple sequence runs together, using more meaningful sample names, and much more.

Possible implementation

Sample sheets should have four mandatory columns:

  1. Sample name
  2. Path
    • This can be an absolute path (starting with /) or a path relative to the --input directory
  3. Type
    • Can be one of
      • directory
        • Combine all reads files found in the directory of 2 and its subdirectories
      • directory-shallow
        • Combine all reads files found in the directory of 2 with no subdirectories
      • file
        • 2 is a single file of reads
  4. Pairity
    • Can be one of
      • none
        • These reads are not paired-end
      • paired
        • Only applies when 3 is either directory or directory-shallow
        • Assumes that the files matching glob *_{R1,1}.* are forward reads and files matching glob *_{R2,2}.* are reverse reads
      • forward
        • Only applies when 3 is file
        • 2 is a file of forward reads
      • reverse
        • Only applies when 3 is file
        • 2 is a file of reverse reads

At this point, it is unreasonable to allow paired-end reads and single-end reads to be analyzed together. However, individual files and directories should be able to be mixed.

The first line (header) is optional, and should be marked with a pound sign (#).

Example

#Sample Path Type Pairity
pig-serum /data/vdl/pig/21 directory paired
pig-feces /data/field/PIG_S001_R1_L001.fastq.gz file forward
pig-feces /data/field/PIG_S001_R2_L001.fastq.gz file reverse
pig-feces /data/field2/pig directory-shallow paired

This can be achieved using a mix of Nextflow channel operators and shell scripts.