reskejak / ATAC-seq

Basic workflow for ATAC-seq analysis
70 stars 33 forks source link

Use for multiple sample differential analysis #14

Open amootta opened 1 year ago

amootta commented 1 year ago

Hi, Thanks for posting these workflows. Very helpful! May I ask how you would modify the R pipeline for differential analysis of multiple treated samples vs multiple controls? Your pipeline mentioned 'biological replicates' which I guess is a different case. Would it be a good idea to instead look for consensus peaks using e.g. MSPC and follow the rest of the pipeline? Any advice would be very much appreciated! Thanks

reskejak commented 10 months ago

Hi - my apologies for the delay in response.

I am not sure if I correctly understand your experimental design. In any design, you should expect to have biological and/or technical replicates in each experimental condition.

If you are interested in comparing multiple treated conditions vs. multiple control conditions in a grand experiment, such as Drugs A/B/C vs. vehicle 1/2/3 or similar, then I would suggest finding the set of total peaks that are [reproducibly] found in at least 1 condition. Then use that as your universe peak set for differential analysis. In this case, you should still have more than 1 observation (assay replicate) per experimental condition to permit variance estimation --> statistical testing. As an example, if you have 3 treated conditions and 3 control conditions, at n=4 per condition, then you would have 24 total assay samples.

If you are interested in comparing multiple treated samples vs. multiple control samples, my interpretation is that this could mean one treated group vs. one control group. This is how I have described our chief mouse ATAC-seq data set in the paper: two treatment groups with n=2 biological replicates per group, so 4 samples in total.

In any case, I think it is a general practice to take all samples in your core experimental design and identify a set of possible peaks that are found in at least 1 of the experimental conditions. This could be through the naive overlap peak set strategy outlined in the repo, or other related strategies (e.g. intersecting replicate peak coordinates).

Hope this helps!