jianhong / motifStack

Plot stacked logos for single or multiple DNA, RNA and amino acid sequence
https://jianhong.github.io/motifStack/articles/motifStack_HTML.html
14 stars 7 forks source link

Would be nice if matalign checked for compatible matrices #10

Closed bpolacco closed 3 years ago

bpolacco commented 3 years ago

It appears that matalign assumes the pfm/pcm have the same row structure (same number of rows and labels in the same order). Depending on how a pfm/pcm is constructed this may not always be true. I ran into this with AA motifs with code similar to this (with oversimplified motifs) that relied on Biostrings::consensusMatrix to construct my pfm matrices:

motifs <- list( c( "SLQRSDSSQPMLL"), c("QPMLLNTPAPVPP" ), c("AGTPIDSSQPMLL", "SLQRSDSSQPMLL" ))

pfms <- lapply(motifs, function(m)new("pfm", Biostrings::consensusMatrix(m, as.prob = TRUE), name = m[[1]]))
motifStack::motifStack(pfms, layout = "tree") 
# alignment quietly fails except for some numerous but mysterious warnings

image

I can fix this (once I recognized the problem) as below, but it would be a nice feature of your method to check its assumptions of equivalent matrix rows and maybe order (when row names are present).

fullAAConsensusMatrix <- function(seqs){
  aaLetters <- c("A", "C", "D", "E", "F", "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y")
  cm <- Biostrings::consensusMatrix(seqs, as.prob = TRUE )
  missingAA <- setdiff(aaLetters, rownames(cm))
  if (length(missingAA) > 0){
    missingRows <- matrix(0, nrow = length(missingAA), ncol = ncol(cm), dimnames = list(missingAA, NULL) )
    cm <- rbind (cm, missingRows)
  }
  # enforce consistent order:
  return (cm[aaLetters,])
}

pfms <- lapply(motifs, function(m)new("pfm", fullAAConsensusMatrix(m ), name = m[[1]]))
motifStack::motifStack(pfms, layout = "tree")

image

Thanks again!

jianhong commented 3 years ago

Thank you for reporting. It was fixed in 1.37.4