loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
180 stars 38 forks source link

Threshold for bounded/unbounded TFs ? #262

Open matteozoia4 opened 2 months ago

matteozoia4 commented 2 months ago

Dear TOBIAS providers,

I have noticed to loose positive binding sites (after TOBIAS analysis) of TFs that we know (through ChIP-seq analysis) to be actually binding regions within our input. -> I am working with snATAC-seq .bam files using JASPAR2024 TFBMs PWMs.

  1. What can I do (adjusting a specific threshold during a given step in the pipeline?) to get/improve the footprinting analysis closest to the biology of the tissue?

Kind regards,

MZ

hschult commented 2 months ago

Hi @matteozoia4 and thank you for using TOBIAS.

I need more information to give a definitive answer for example the TOBIAS call would be a good start. However, as you are using snATAC-seq data sparsity might be the issue. TOBIAS relies on the Tn5-cutsites to predict footprints, where a footprint is defined as a small area with less cutsites flanked by areas containing more cutsites (see our wiki) as such a certain amount of Tn5-cutsites and subsequently read coverage is needed to reliably predict footprints. For bulk ATAC, which TOBIAS was designed for, this is usually not an issue, however analysing at cell-level often does not provide enough cutsites and therefore may not see some of the footprints. A possible workaround would be to create "pseudobulks" by combining your .bam files into one .bam per group (cell type, condition, etc.) and running TOBIAS with these files but keep the number of cells per group in mind, small groups could still suffer from data sparsity.

I hope this answers your questions.

Best wishes, Hendrik