nf-core / seqinspector

QC pipeline to inspect your sequences
https://nf-co.re/seqinspector
MIT License
4 stars 14 forks source link

Have a samplesheet with an extra level of grouping #16

Open maxulysse opened 4 months ago

maxulysse commented 4 months ago

Description of feature

I would love to be able to add an extra level for grouping:

That way I would be able to have multiple samples from the same patients and differentiate them from samples from another patient within the same group/project/cohort.

Aratz commented 4 months ago

That's an interesting idea, maybe the most flexible way to implement this would be to have some kind of tagging system?

More specifically, this could be an extra column where one could define any number of tags that can be used to group samples together in a specific report. It could look something like this:

sample,fastq_1,fastq_2,tags
SAMPLE1,/path/to/fastq/files/AEG588A4_S1_L003_R1_001.fastq.gz,,"lane_1,group_A,patient_1"
SAMPLE2,/path/to/fastq/files/AEG588A4_S2_L003_R1_001.fastq.gz,,"lane_1,group_A,patient_2"
SAMPLE3,/path/to/fastq/files/AEG588A4_S3_L003_R1_001.fastq.gz,,"lane_2,group_A,patient_1"
SAMPLE4,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz,,"lane_2,group_A,patient_2"
SAMPLE5,/path/to/fastq/files/AEG588A4_S5_L003_R1_001.fastq.gz,,"lane_2,group_B"

This would generate 7 reports:

This could replace columns lane, group and rundir since these are optional and are not defined for all applications.

I can imagine this pipeline being used for a wide range of applications, and it is probably unrealistic to hard code all possible ways to group samples together. I think we are already seeing the limits of that approach for instance with group, which is very important for us sequencing platforms, but maybe less for research teams, or with lane, which is specific to some sequencing instruments.

By using this tagging system we could handle basically any way to group samples.

Aratz commented 4 months ago

Although maybe rundir we need to keep :thinking: because that's a path we need to fetch information from files that come from the sequencer (e.g. InterOp files)