loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
189 stars 40 forks source link

A few questions on PlotAggregate #187

Closed cardinLL closed 1 year ago

cardinLL commented 1 year ago

Hi, msbentson,

I encountered some problems when using the PlotAggregate command. I obtained the binding pattern of Stat2 from three separate BINDetect results and wanted to visualize the difference in its bound footprint across different conditions. These are the three plot that I got: Stat2_aggregate Stat2_aggregate2 Stat2_aggregate comparing to your example picture, two of them are quite different: image I wonder if this is problematic or normal. If this is problematic, what might have caused this problem? Additionally, I am just a beginner in bioinformatics. So I would like to ask a very fundamental question: what does the y value in an aggregated plot mean and how to interpret an aggregated plot? You can use the third picture that I attach as an example if you wish, since it looks normal. I will greatly appreciate any help!!

Regards,

Lihao Yang

msbentsen commented 1 year ago

Hi Lihao,

Which bigwig files are you using to make the plot with PlotAggregate? It looks like it might be the footprint score files (which represent the scores at each position), but the ones from the example are calculated from the _corrected.bw files, which represent the Tn5 insertion.

The y-axis is the mean insertion rate across all the sites in the plot (the upper right corner of each plot shows how many sites there are). So you can imagine a large matrix of centered signals, where we take the column mean for each base. Hope that clears up the question.

cardinLL commented 1 year ago

Hi msbentsen,

Thank you for your response! I figured out that I used the uncorrected bigwig file, which might have caused the problem. Also, I just need a little bit of nudge on interpreting an aggregate plot. For the third picture that I attached, is it appropriate to interpret it as: Stat2 shows more binding sites in ONC1D condition but has deeper footprint in Naive and ONC3D condition.

Regards,

Lihao Yang

msbentsen commented 1 year ago

Hi Lihao,

In order to compare between conditions, I would plot the footprint across all binding sites (not only bound), as shown here: https://github.com/loosolab/TOBIAS/blob/main/figures/BATF_footprint_comparison_all.png

That way, the plot is not dependent on the number of bound sites. In the plot above, the naive condition has a deeper footprint, but also less bound sites, so it is difficult to compare with the other conditions.

cardinLL commented 1 year ago

Hi msbentsen,

Thank you for your answer!! I think my confusion has been resolved!