loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
191 stars 41 forks source link

BINdetect, or? #128

Closed araposo-cantab closed 1 year ago

araposo-cantab commented 2 years ago

Dear TOBIAS, How can use your tool without a motif database? Eg, BINDetect (or another, more appropriate module) with no --motifs parameter.

Why? I'm interested in identifying, mapping and scoring footprints, and calculate differential binding between two conditions without the limitations of current annotation.

Cheers, Alexandre

msbentsen commented 2 years ago

Hi,

This is a good question! The answer is unfortunately no, not yet. This is something we are working on for another project, but it is not included in TOBIAS yet.

If you already have some regions of interest, you might utilize TOBIAS ScoreBed to obtain the footprint scores per region (e.g. for footprints of condition A and B). This does not include the differential binding, but might be helpful if you calculate a foldchange and sort by changes between A/B for certain regions. I hope this was helpful even if it didn't really help your original question.

Best Mette

araposo-cantab commented 2 years ago

Hi Mette,

Thank you for your answer, what you suggest could work for me... But bearing in mind footprint scores have no replicates, how would you calculate this fold-change? I.e., mean, what model/stats would you go for? Thanks again, any insight would be much appreciated. Cheers, Alexandre

P.S.: Since you're working on it, do you think that differential binding in the future could go beyond pair-wise comparisons and allow some sort of experimental design modelling, like we do with expression data? eg, design:model.matrix(~ factor1 + factor2 + factor1:factor2)

msbentsen commented 2 years ago

Hi Alexandre

In the BINDetect model, it is just a simple log2(condition1+pseudocount / condition2+pseudocount) where pseudocount is an appropriate pseudocount (alike to +1 for expression data). This is then compared per TF towards the distribution of background log2FCS (same calculation for a random selection of regions), to control for global differences between condition1/2 footprint strength.

I have had requests for the experimental design matrix before, and it would be amazing to include! But the implementation would unfortunately be quite complex in the current setup, and we have no plans to include that at the moment. Again, you might access the raw footprint scores and setup the design yourself - although this unfortunately comes with a bit of coding, sorry!