Open maxulysse opened 4 months ago
That's an interesting idea, maybe the most flexible way to implement this would be to have some kind of tagging system?
More specifically, this could be an extra column where one could define any number of tags that can be used to group samples together in a specific report. It could look something like this:
sample,fastq_1,fastq_2,tags
SAMPLE1,/path/to/fastq/files/AEG588A4_S1_L003_R1_001.fastq.gz,,"lane_1,group_A,patient_1"
SAMPLE2,/path/to/fastq/files/AEG588A4_S2_L003_R1_001.fastq.gz,,"lane_1,group_A,patient_2"
SAMPLE3,/path/to/fastq/files/AEG588A4_S3_L003_R1_001.fastq.gz,,"lane_2,group_A,patient_1"
SAMPLE4,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz,,"lane_2,group_A,patient_2"
SAMPLE5,/path/to/fastq/files/AEG588A4_S5_L003_R1_001.fastq.gz,,"lane_2,group_B"
This would generate 7 reports:
This could replace columns lane
, group
and rundir
since these are optional and are not defined for all applications.
I can imagine this pipeline being used for a wide range of applications, and it is probably unrealistic to hard code all possible ways to group samples together. I think we are already seeing the limits of that approach for instance with group
, which is very important for us sequencing platforms, but maybe less for research teams, or with lane
, which is specific to some sequencing instruments.
By using this tagging system we could handle basically any way to group samples.
Although maybe rundir
we need to keep :thinking: because that's a path we need to fetch information from files that come from the sequencer (e.g. InterOp files)
Description of feature
I would love to be able to add an extra level for grouping:
That way I would be able to have multiple samples from the same patients and differentiate them from samples from another patient within the same group/project/cohort.