Closed kat-in-the-hat closed 3 years ago
Hi,
Short answer is yes, it is possible :-) Longer answer is, that there are a few things to take into account:
TOBIAS BINDetect
assumes that the general accessibility of the two samples within the given peaks is roughly the same (and tries to normalize these against each other). So if you only give peaks from one genotype, that will skew the distribution and possibly mess up normalization. But if you give all differentially accessible peaks (both up and downregulated), that should not be a problem. It is a good sign that you saw similar results between using all vs. differential peaks, and it shows that the all-peaks result is mostly driven by the differential peaks (which is also the whole idea of TOBIAS).
You can also use the flag "--output-peaks" within TOBIAS BINDetect
to set a different peak set for the output analysis. The explanation of this flag is: Gives the possibility to set the output peak set differently than the input --peaks. This will limit all analysis to the regions in --output-peaks. NOTE: --peaks must still be set to the full peak set!
This will normalize the samples based on all peaks, but calculate differential scores based on your chosen output peaks. This again has some limitations if the output peaks are skewed a lot in one direction - but you might try this out as well.
Hope this was helpful!
Thank you very much Mette!
We had just realised that our initial run was done on the previous version of TOBIAS. We were able to see similar results when running it on "all peaks" or on "differentially accessible" peaks (even though we had more open that closed chromatin). So in the first version, we did not see skewing.
However, we wanted to following your advice and noticed we could only have this option in the new version of TOBIAS. We ran the analysis again on the new version, and this time got drastically different results when running it on "all peaks" or only on "differentially accessible peaks" - with the "differentially expressed peaks" now exhibiting the skew you mentioned may happen.
Could you please let us know what are the major differences between the two versions of TOBIAS that could account for this discrepancy?
Sorry for the long message and thanks again for all your help! :)
There were some changes to the normalization around version 0.12.0/0.12.1, as the previous versions were quite sensitive to outliers. I can't say exactly how your data is behaving with the two versions, but I would of course always recommend to use the latest version ;-)
Even if it is skewed, you can still trust the left-most and right-most (meaning differentially changed in both directions) TFs, and use these for analysis. Even if the overall distribution is skewed to one side or the other, these are the most changed.
Hello,
Thank you very much for developing this tool!
I just wanted to ask, would it be appropriate to run TOBIAS only on differentially expressed peaks between 2 genotypes?
This is because we had hypothesised that a certain TF would be differentially bound in these regions in 1 genotype over the other, and wanted to use TOBIAS as a means of testing whether our hypothesis is correct. By running it only on the differentially accessible sites, we hoped to get a more focused analysis. We ran TOBIAS on both "all peaks" and only "differentially accessible peaks" and got similar results.
Would appreciate your opinion on this, and whether this is okay to do and whether there are limitations or concerns we should be aware of.
Thanks in advance for your help!