kundajelab / chrombpnet

Bias factorized, base-resolution deep learning models of chromatin accessibility (chromBPNet)
https://github.com/kundajelab/chrombpnet/wiki
MIT License
124 stars 34 forks source link

provenance of tn5 motif meme data? #177

Closed mkarikom closed 10 months ago

mkarikom commented 10 months ago

Hi, Can you provide a reference for the PWM matrices in chrombpnet/data/motifs.meme.txt? I could not find this info in Nair 2021, but attribution-based diagnosis on the background model, including the use of deeplift/motisco/tomtom for positive id of tn5 seems like a critical step. Thanks!

panushri25 commented 10 months ago

The ChromBPNet prepint is not out yet so you can cite this repo. The PWM matrices for Tn5 motifs are a combination of all the Tn5 variations in our background model.

panushri25 commented 10 months ago

May I ask what is this in reference to?

mkarikom commented 10 months ago

Hi, thanks for the quick reply!

I'm using the deeplift/tfmodisco/tomtom pipeline to check attributions on my own background model (as suggested in your very nice FAQ). In particular, I wanted to make sure that tomtom was able to see [positive] Tn5 homology among the various modisco clusters on the background model.

But I haven't been able to find a primary reference for transposase binding motifs in any online meme database. Initially, I mined all [~200k] keys in the meme suite for something like Tn5, but came up dry (genomics novice). Only after that I noticed that you had already uploaded Tn5 motifs to the repo...

Since I'm scripting this for reproducibility, I need to know precisely how to get these. If you did not retrieve them from a primary source, I would want to re-generate them myself...

panushri25 commented 10 months ago

yeah online databases dont have a motif representation for Tn5.

We look at background models in different celltypes and pick the representative motifs of Tn5 variants while making sure there are no TF motifs. You can differentiate it based on eye (look for the palindromic nature of Tn5), here we included in our report to assist the user in annotation.

mkarikom commented 10 months ago

Ahh, so in other words, diagnosis of any new background model is based on empirical summarization of many previously trained background models?

panushri25 commented 10 months ago

yeah it is empirical

panushri25 commented 10 months ago

These are the conventional tn5 logos (https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-019-1642-2/MediaObjects/13059_2019_1642_MOESM1_ESM.pdf , page 5) you will notice variants of this in your background model