nf-core / cutandrun

Analysis pipeline for CUT&RUN and CUT&TAG experiments that includes QC, support for spike-ins, IgG controls, peak calling and downstream analysis.
https://nf-co.re/cutandrun
MIT License
81 stars 47 forks source link

Question: How are multiple replicates of control condition used? #246

Closed ChristophH closed 3 months ago

ChristophH commented 4 months ago

From the example samplesheet and the description shown at docs/usage, it seems like the control replicates (IgG) are assigned the same replicate of non-control (if there are the same number of replicates), or only the first one is used (if the number of replicates are not the same. At least that's my interpretation of the documentation (check the link above).

What is the rational of assigning control_rep1 to non-control_rep1, control_rep2 to non-control_rep2, etc.? Generally, I don't think of the replicates as being paired. Also, if I have three non-control replicates, but only two control replicates, the second control will not be used at all - is that correct? In that case, should I just "pretend" that both control replicates are the same (by assigning replicate number 1 to both of them), so that at least their reads will be aggregated?

I appreciate the work all contributors are putting into this pipeline! Thanks!

code-flowbio commented 4 months ago

Hi Christoph thanks for your quesiton. In my experience of CUT&RUN, biological replicates are often paired together with their controls therefore that is the default. You are correct that if there is a mismatch of replicates only the first is used. If you want a single control from multiple control samples, then indeed you need to set them as technical replicates and they will automatically be merged by setting the same sample id and replicate number.

ChristophH commented 3 months ago

@code-flowbio Thank you for your reply. Our experimental setup did not allow for paired control non-control samples. I ended up merging the control samples as suggested above.