Have a samplesheet with an extra level of grouping

maxulysse commented 4 months ago

Description of feature

I would love to be able to add an extra level for grouping:

group/project/cohort
- individual/patient
  - sample (tumor/normal)

That way I would be able to have multiple samples from the same patients and differentiate them from samples from another patient within the same group/project/cohort.

Aratz commented 4 months ago

That's an interesting idea, maybe the most flexible way to implement this would be to have some kind of tagging system?

More specifically, this could be an extra column where one could define any number of tags that can be used to group samples together in a specific report. It could look something like this:

sample,fastq_1,fastq_2,tags
SAMPLE1,/path/to/fastq/files/AEG588A4_S1_L003_R1_001.fastq.gz,,"lane_1,group_A,patient_1"
SAMPLE2,/path/to/fastq/files/AEG588A4_S2_L003_R1_001.fastq.gz,,"lane_1,group_A,patient_2"
SAMPLE3,/path/to/fastq/files/AEG588A4_S3_L003_R1_001.fastq.gz,,"lane_2,group_A,patient_1"
SAMPLE4,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz,,"lane_2,group_A,patient_2"
SAMPLE5,/path/to/fastq/files/AEG588A4_S5_L003_R1_001.fastq.gz,,"lane_2,group_B"

This would generate 7 reports:

one global report with all samples
two lane reports, (one with SAMPLE1 and SAMPLE2, the other with SAMPLE3 and SAMPLE4)
two group reports (one with SAMPLE1, SAMPLE2, SAMPLE3, SAMPLE4, and one with SAMPLE5)
two patient reports (one with SAMPLE1 and SAMPLE3, the other with SAMPLE2 and SAMPLE4)

This could replace columns lane, group and rundir since these are optional and are not defined for all applications.

I can imagine this pipeline being used for a wide range of applications, and it is probably unrealistic to hard code all possible ways to group samples together. I think we are already seeing the limits of that approach for instance with group, which is very important for us sequencing platforms, but maybe less for research teams, or with lane, which is specific to some sequencing instruments.

By using this tagging system we could handle basically any way to group samples.

Aratz commented 4 months ago

Although maybe rundir we need to keep :thinking: because that's a path we need to fetch information from files that come from the sequencer (e.g. InterOp files)

nf-core / seqinspector

Have a samplesheet with an extra level of grouping #16

Description of feature