slowkow / CENTIPEDE.tutorial

:bug: How to use CENTIPEDE to determine if a transcription factor is bound.
https://slowkow.github.io/CENTIPEDE.tutorial
25 stars 13 forks source link

Meaning of "score" column in the output #16

Closed rdbcasillas closed 4 years ago

rdbcasillas commented 4 years ago

Hello slowkow,

Apologies if this isn't the right place to ask this. But have the authors of this tool documented the meaning of "score" column in the CENTIPEDE output? You cover posterior probability in your tutorial, but not score. Do you happen to know the meaning of it and it's relevance in deciding which motif sites are more important than others?

Thanks a lot!

slowkow commented 4 years ago

The "score" column that you are referring to is in the FIMO output, not in the CENTIPEDE output.

Please see the FIMO documentation for more information about the "score".

FIMO converts each input motif into a log-odds PSSM and uses each PSSM to independently scan each input sequence. It reports all positions in each sequence that match a motif with a statistically significant log-odds score. You can control the match p-value that is considered significant, and whether or not FIMO reports matches on both strands when the sequence alphabet is complementable (e.g., DNA or RNA).

There is no "score" column in the CENTIPEDE output.

In retrospect, I probably should have called the cen object something else like dat to make it clear that it is not the output from CENTIPEDE.

The only part of cen that goes to CENTIPEDE is the count matrix:

fit <- fitCentipede(
  Xlist = list(DNase = cen$mat),
  Y = as.matrix(data.frame(
    Intercept = rep(1, nrow(cen$mat))
  ))
)