Open drosop opened 8 months ago
Hi @drosop ,
Unfortunately no, there is no way to add batch effect correction to TOBIAS.
If your samples are separated like this:
the effect might cancel out if you merge the .bam-files of each condition.
Another option might be to run every sample individually with TOBIAS, and then try to correct the footprint-scores manually afterwards. But all in all, there is no direct way to do this in TOBIAS, sorry!
Hi @msbentsen,
thanks for creating of this fantastic tool! I have a follow-up question on this topic. When you mention:
If your samples are separated like this:
batch1: condition1-rep1, condition2-rep1
batch2: condition1-rep2, condition2-rep2
the effect might cancel out if you merge the .bam-files of each condition.
does this imply that overcoming batch effects might be feasible if the dataset is paired for all conditions? My ATAC-seq samples are from tumor-infiltrating (T), normal-tissue (N) and peripheral blood (PB) immune cells from distinct patients and they are clearly separated by subject. For only one patient I lack the PB sample, but currently I've included all samples in TOBIAS, merging the bam files by tissue condition (i.e. 5 T, 5 N and 4 PB). Do you think that I should remove the patient without PB to achieve a more balanced dataset (i.e. paired samples: 4 T, 4 N, 4 PB) in order to better mitigate the batch effect? Or, since my main comparison is actually T vs N, would it be better to run an analysis with only the paired T and N samples (i.e. 5 T and 5 N)?
Regarding the second option proposed from the previous answer, if I run the analysis individually for each patient, what method do you recommend for manually correcting footprint scores afterward?
Thanks again!
Carolina
Hi @DossenaCarolina ,
Sorry for the late reply. In regards to the paired samples, I will say that in theory the batch effects should cancel out if each condition contains paired data. So if you have something like million of reads per sample: | Patient | T | N | PB |
---|---|---|---|---|
P1 | 3**6 | 4**6 | - | |
P2 | 3**6 | 4**6 | 4**6 | |
P3 | 3**6 | 4**6 | 4**6 | |
etc. | ... | ... | ... |
The percent influence of each patient when comparing T/N should be equal when using the same patients. So I would agree to run it more paired like 4T-4N-4PB or 5T-5N rather than 5T-5N-4PB.
For manually correcting footprint scores, you might look at something like limma
or combat
, or even just quantile normalization if the effect is only in the strength of the signal. This is not something I have done however, so I cannot speak for how well it works.
I hope that helps you out!
Hi,
Im working on bulkATAC data. The experiment was run in two batches and when I made the PCA plot, the data is separated by batches indicating strong batch effect.
Can I use these samples with batch effect for running tobias?
Is there any way to remove to batch effect prior to tobias? I used limma to correct for batch effect for differential accessiblity analysis.
Thank you,