nf-core / differentialabundance

Differential abundance analysis for feature/ observation matrices from platforms such as RNA-seq
https://nf-co.re/differentialabundance
MIT License
57 stars 32 forks source link

technical replicates #220

Open animesh opened 9 months ago

animesh commented 9 months ago

Description of the bug

not sure how to handle technical replicates in contrast file? i have a samples.csv like one presented at https://nf-co.re/differentialabundance/1.4.0/docs/usage but with one more column (tech) representing technical replicate

sample|fastq_1|fastq_2|condition|replicate|tech|batch
CONTROL_REP1T1|AEG588A1_S1_L002_R1_001.fastq.gz|AEG588A1_S1_L002_R2_001.fastq.gz|control|1|1|A
CONTROL_REP1T2|AEG588A1_S1_L003_R1_001.fastq.gz|AEG588A1_S1_L003_R2_001.fastq.gz|control|1|2|B
CONTROL_REP1T3|AEG588A1_S1_L004_R1_001.fastq.gz|AEG588A1_S1_L004_R2_001.fastq.gz|control|1|3|A
...
TROL_REPNT1|AEG588AN_S1_L002_R1_001.fastq.gz|AEG588A1_S1_L002_R2_001.fastq.gz|trol|1|1|A
TROL_REPNT2|AEG588AN_S1_L003_R1_001.fastq.gz|AEG588A1_S1_L003_R2_001.fastq.gz|trol|1|2|B
TROL_REPNT3|AEG588AN_S1_L004_R1_001.fastq.gz|AEG588A1_S1_L004_R2_001.fastq.gz|trol|1|3|A

but i am not sure what should be the best way to present this information in contrasts.csv?

Command used and terminal output

No response

Relevant files

No response

System information

No response

pinin4fjords commented 8 months ago

Assuming this is RNA-seq, we would normally expect technical replicates (assumed to mean multiple sequencing runs of the same biological sample) to have been collapsed in the upstream analysis- see https://nf-co.re/rnaseq/3.13.2#usage.

Do you have reason to handle individual tech reps separately?

The DESeq2 module in nf-core doesn't currently have capability for collapsing the tech reps there, so that would need addressing before we could handle them as part of this workflow.

animesh commented 8 months ago

Thanks @pinin4fjords for the response. I just wanted to see how well the tech-reps are clustering and they were clustering quite well 👍🏽 So i tried to merge the counts by summing and presented the sum of tech-reps as bio-rep in contrast for RNA-seq pipeline, which seems to have worked 🤞 Probably this process can be part of pipeline itself?

pinin4fjords commented 8 months ago

Thanks- we'll consider taking matrices with unmerged tech reps in future development.