loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
193 stars 41 forks source link

About clustering of transcription factor(TF) activities throughout development #23

Closed whui2bioinformatics closed 4 years ago

whui2bioinformatics commented 4 years ago

Hi, Mette. I'am interested in this picture showed in your paper. and I want to ask some questions about it. Thank you very much. image The first question is what value you used to make heatmap shown in fig.a? the footprint mean score of all predicted sites of certain TF's or the mean score of bound sites of certain TF's? The second question is the footprint score filled in the these table, //beds/all.bed, //beds/_bound.bed, /bindetect_results.{txt,xlsx}, normalized between conditions? Can we use the score to compare footprint among conditions, directly? The last, I didn't find any differential footprint between conditions, just like this, image, Could you give me some advices about this? thanks again.

msbentsen commented 4 years ago

Hi, Thank you for your interest in the tool/paper! I will try my best to answer your questions here:

  1. We used the <condition>_mean_score columns from the bindetect_results.txt-file, which is the mean score across all sites. The scores are also z-score normalized per row, as some TFs naturally have higher mean scores than others (and this would influence the clustering).
  2. Yes, the footprint score filled in the tables are normalized! This is also why there might be a slight difference in the score from the bigwig and the score in the _all.bed/_bound.bed/etc.-files. If the conditions were run together in BINDetect, all conditions are normalized to each other, so the scores from the files can be compared directly.
  3. Changes in footprint are not always visible between conditions, even if the scores indicate changes in footprint scores. This can be because the scores are driven by the accessibility (as also seen in your plot; some of the conditions have "taller" signal/signatures). It can also be because there are not that many differences in the number of sites bound, but rather in the positions of the sites bound, or that most motif sites are bound in all conditions, masking smaller effects between conditions. If you want to go a bit deeper into this, you could try to define some target peaks/genes or otherwise subset the "_all.bed"-TFBS files to show a subset of TFBS, which might be more indicative of the changes. You could also choose a subset of TFBS with the highest differential scores across all conditions. Then, you can apply PlotAggregate again to compare for the specific subset.

I hope this answered your questions :-) Best Mette

whui2bioinformatics commented 4 years ago

Ok, thank you for your answer and advices. o( ̄︶ ̄)o