loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
188 stars 39 forks source link

TF's below footprint threshold are in bound bedfile and TF's that shouldn't be in unbound file are in there? #161

Closed nessj216 closed 2 years ago

nessj216 commented 2 years ago

Hello, I understand that found motifs that have a p-value smaller than threshold p-value make it to the TF list, where they are assigned their footprint score. (TF motifs with too high a p-value never get a footprint score, right?). Then TF motifs that have a high enough footprint score make it the bound.bed file, while the TF motifs that have a footprint score below the threshold are in the unbound.bed file. BUT I have found a potential issue if my understanding above is correct. I am getting too low of footprint scores in my bound.bed file (that fall below the threshold value established). Conversely, I am getting motifs with high enough footprint scores that remain in the unbound bed file. For example here you can see the 'bound', 'unbound' and 'all' bed files in IGV, where the footprint threshold from Tobias (for this particular ATAC replicate) is calculated to be ~3.5. BUT all the motifs--even though they their scores is above that threshold--are still on the unbound list.

Screen Shot 2022-07-30 at 3 12 30 PM

(P.S. I am NOT doing differential analysis of TF footprints between two conditions. I am only doing one condition. The input to BINDetect is 1)the footprint file made from ScoreBigwig, 2)the SAME peak bed file I used in all the other steps!, 3) my PWM file, 4) the genome file (drosophila in this case).

any help would be greatly appreciated!!!! thank you

nessj216 commented 2 years ago

here is the converse--where TFs with too low footprints are in the bound file

Screen Shot 2022-08-02 at 11 17 22 AM
msbentsen commented 2 years ago

Hi @nessj216 ,

The score which is shown in IGV is the motif score (5th column of the .bed), which is the score of the TF motif to the genome sequence - and this is independent of footprinting. The footprint score is found in the column named <condition>_score in the bindetect_output txt-file. There is also an overview of the output columns here. I hope this clears up your question!

BR Mette

nessj216 commented 2 years ago

thank you a million!! sorry about that!! this is very helpful