loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
193 stars 41 forks source link

The output of BINDectect about much time points ATAC data #290

Closed lay-lei closed 1 week ago

lay-lei commented 1 month ago

Hi,

I try to use TOBIAS BINDetect to predict the activated TFs on a continuous timeline. In bindetect_results.xlsx, I obtain the mean_score and and bound numbers for each point. But it's confused me that how to use these scores to filter the activated TF for at each time. How are the mean_score calculated?

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

name | motif_id | cluster | total_tfbs | BT_mean_score | BT_bound | DAY1_mean_score | DAY1_bound | DAY5_mean_score | DAY5_bound -- | -- | -- | -- | -- | -- | -- | -- | -- | -- MA0601.2.Arid3b | MA0601.2 | C_MA0601.1.Arid3b | 3598 | 0.19213 | 20 | 0.19998 | 21 | 0.20374 | 18 MA0704.1.Lhx4 | MA0704.1 | C_MA0075.1.Prrx2 | 6309 | 0.20454 | 44 | 0.21004 | 41 | 0.23967 | 39 MA0853.1.Alx4 | MA0853.1 | C_MA0720.1.Shox2 | 5551 | 0.23647 | 79 | 0.2414 | 73 | 0.27021 | 74

The mean_score are close at every time point in my data. Can these subtle differences indicate that there are more TF-motif combinations?

And I choose one of signaficant TF-motif to PlotAggregate, but there seems to be something wrong, is it due to the data quailty? So, I wonder if I can use the mean_score or bound numbers to plot heatmap to find activated TF at each time points as presented in your article? image

Thank u so much.

mohobein commented 1 month ago

Hello @lay-lei,

the mean score represents how clear the footprints at the TFBS of this TF were on average. It is calculated by taking the mean of the <condition>_score column per condition for each TF from the <TF_name>_overview.txt files. How the score per TFBS is calculated can be seen here. However, this metric is not the best for comparison between conditions as it does not represent the TFs that we bound the most, just how strong their footprints were. A better metric for comparison between conditions is the differential binding score found in the <contrast>_change column. If you're interested in how this score is calculated, you can check page 11 of the supplemental material of the TOBIAS paper.

Looking at the example data you posted, the mean_score values do seem oddly low. The plots below also support that there seems to be something wrong as there is no real footprint and the values are also basically not changing. May I ask what commands you used to generate your BINDetect results and the PlotAggregate figure? The problem might indeed be the data quality, but I cannot say for certain. Have you taken a look at the uncorrected cutsites, the corrected cutsites and the footprint score tracks in a genome viewer like IGV? If the first few look fine, it might be a problem with TOBIAS, otherwise it is probably your input data.

I hope this answers your question. If I can be of further assistance, let me know!

Kind regards, Moritz

github-actions[bot] commented 2 weeks ago

No activity for at least 30 days. Marking issue as stale. Stale issues are closed after one week.