loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
188 stars 39 forks source link

Interpreting BINDetect _bound.bed files + Running CreateNetwork with JASPAR dimers #228

Closed beginnertobioinformatics closed 1 year ago

beginnertobioinformatics commented 1 year ago

Hello,

This is more of a conceptual clarification on BINDetect and the output "(condition)_bound.bed" files. I have two conditions (Control and Treatment) in my data and ran BINDetect to determine differential transcription factor binding between the two conditions. When looking at the treatment_bound.bed files, I was expecting that all of the peaks within which footprints were identified would be peaks that were called from either my Treatment sample data alone or were called for both my Control and Treatment samples. However, a fair amount of footprints in the treatment_bound.bed files seemed to fall within peaks that were only called for my Control sample. The opposite is also true for my control_bound.bed files-- a fair amount of the footprints there fall within peaks that were only called for my Treatment sample.

Is this normal? Or does this indicate that something is wrong with either my files/my analysis?

-- I also had a question regarding CreateNetwork. When I ran BINDetect, I used a file containing the JASPAR motifs identified for Mus Musculus as the --motifs input. As a result, some of the motifs are dimers (e.g. Fos::Jun) that don't map to a single Ensembl ID, which means that when running CreateNetwork, these dimers seem to be ignored. Is there any way that's typically recommended in order to bypass this issue?

Thank you so much for your insight! This pipeline has been so incredibly helpful and informative for me as a beginner to ATAC-seq!

sufyazi commented 1 year ago

Hello,

I am not Mette who is the dev of TOBIAS, but I can try pointing you to the right direction for your second question.

See here, a question I asked before. Basically, CreateNetwork does not currently handle these heterodimeric motifs.

beginnertobioinformatics commented 1 year ago

Thank you so much! I had thought this might have already been posted under "Issues," but missed that post when I was looking through. Thank you!

msbentsen commented 1 year ago

I would have answered exactly the same as @sufyazi, thank you! I will close this issue, but feel free to reopen in case it's not resolved.

beginnertobioinformatics commented 1 year ago

Yes, thank you. With regards to my first question about how footprints are being detected where peaks supposedly weren't called for said condition, I am guessing this has to do with the initial step before this pipeline of the merging of all .bed files to create a union set of peaks?

msbentsen commented 1 year ago

With regards to my first question about how footprints are being detected where peaks supposedly weren't called for said condition

Sorry, I missed the original question. Yes, it might happen that a peak is called for condition A but not for condition B, even though both are similar in signal - that depends on the peak-calling algorithm and how it chooses what is a peak or not. it might be a false negative. So since the input to TOBIAS is all peaks (merged from A and B), it can still happen that you find bound TFBS in condition B although the peak was not called. I would not worry too much about the origin of the peak and just refer to these as a "search field of open chromatin", which may or may not be open in that specific condition. Hope that makes sense.

beginnertobioinformatics commented 1 year ago

Thank you for the clarification!