kundajelab / chrombpnet

Bias factorized, base-resolution deep learning models of chromatin accessibility (chromBPNet)
https://github.com/kundajelab/chrombpnet/wiki
MIT License
111 stars 29 forks source link

question regarding bias #93

Closed PelFritz closed 1 year ago

PelFritz commented 1 year ago

Hi @panushri25,

I am trying to also re-implement the bpnet model but for something similar to ATAC-seq (MOA-seq). It also captures "all" binding activity on the chromosomes (so not specific like Chip-seq). My question is how massive is the effect of training your ATAC implementation of bpnet model without bias tracks? Any suggestions for model training without the bias?

panushri25 commented 1 year ago

Thank you for your question @PelFritz

ATAC-seq has a very strong bias. So when we trained BPNet models on ATAC-seq directly without the bias tracks, we noticed that it captures a lot of Tn5 motifs.

For example refer to slide 11 from the presentation here. The ChromBPnet without bias correction is simply a bias model trained on ATAC-seq. You will see that the sequence features learntby it are very noisy (due to confounding from Tn5).

If we summarized the motifs using TF-modiso this is what the profiles look like with and without bias correction. Notice that some of the TF motifs are distored because of Tn5 when bias correction is not applied.

This noisy confounding from bias is stronger in ATAC-seq than DNase-seq, because Tn5 has a stronger bias compared to DNase-I. So it really depends on how strong the enzyme bias is in MOA-seq. Training a BPNet model (without a bias track) is I think a good starting point to understand the type and strength of the biases and you can then use ChromBPNet to correct for it.

Hope this answers your question. The slides are attached with the main page of repo for future browsing and reference.

panushri25 commented 1 year ago

Hello @PelFritz, I am closing this due to inactivity, feel free to open this if you have any more questions