ococrook / hdxstats

hdxstats: An R-package for statistical analysis of hydrogen deuterium exchange mass-spectrometry data.
Apache License 2.0
5 stars 2 forks source link

hdxdifference function - a suggestion #44

Closed nlgittens closed 1 year ago

nlgittens commented 1 year ago

Hi Olly,

I am back with another issue with the difference plots; after looking at some more data I was still not quite happy that things were working as they should, but I think I have got to the bottom of the issue now. So in the HOIP dataset (taking newqDF object as an example), when you compute the diffMap, only the first row is populated with values. This is because of line 874:-

j <- which.min(rowSums(peptideMap[, seq.int(start[i], end[i])] > 0)) diffMap[j, seq.int(start[i], end[i])] <- diff[i]

  [,1] [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]    [,11]    [,12]    [,13]    [,14]

[1,] 0 0 0.156536 0.156536 0.156536 0.156536 0.156536 0.757045 0.329764 0.590997 0.590997 0.590997 0.473997 0.473997 [2,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [3,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [4,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [5,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [6,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [7,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [8,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [9,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [10,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [11,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [12,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [13,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [14,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [15,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [16,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [17,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [18,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [19,] 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [,15] [,16] [,17] [,18] [,19] [,20] [1,] 0.473997 0.473997 0.473997 0.473997 0.473997 0.473997 [2,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [3,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [4,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [5,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [6,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [7,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [8,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [9,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [10,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [11,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [12,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [13,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [14,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [15,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [16,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [17,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [18,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [19,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

At different seq_along, where difference < 0, the peptide map is not being populated. So this should be:

j <- which.min(rowSums(diffMap[, seq.int(start[i], end[i])] != 0)) diffMap[j, seq.int(start[i], end[i])] <- diff[i]

diffMap[,1:20] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [1,] 0 0 0.1033723 0.1033723 0.1033723 0.1033723 0.1033723 -0.07071067 -0.07071067 -0.07071067 -0.07071067 -0.07071067 [2,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.24243033 0.24243033 0.24243033 0.24243033 0.24243033 [3,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.51804633 0.51804633 0.51804633 0.51804633 0.51804633 [4,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 -0.04394533 -0.04394533 -0.04394533 -0.04394533 [5,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.34385000 0.34385000 0.34385000 0.34385000 [6,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.27504533 0.27504533 0.27504533 0.27504533 [7,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.17231433 0.17231433 0.17231433 [8,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.26388600 0.26388600 0.26388600 [9,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.29493833 0.29493833 0.29493833 [10,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.43586333 0.43586333 0.43586333 [11,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.43709100 0.43709100 0.43709100 [12,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 [13,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 [14,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 [15,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 [16,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 [17,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 [18,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 [19,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [1,] -0.07071067 -0.07071067 -0.07071067 -0.07071067 -0.07071067 -0.07071067 0.00000000 0.00000000 [2,] 0.24243033 0.24243033 0.24243033 0.24243033 0.24243033 0.24243033 0.24243033 0.24243033 [3,] 0.51804633 0.51804633 0.51804633 0.51804633 0.51804633 0.51804633 0.51804633 0.51804633 [4,] -0.04394533 -0.04394533 -0.04394533 -0.04394533 -0.04394533 -0.04394533 0.00000000 0.00000000 [5,] 0.34385000 0.34385000 0.34385000 0.34385000 0.34385000 0.34385000 0.34385000 0.34385000 [6,] 0.27504533 0.27504533 0.27504533 0.27504533 0.27504533 0.27504533 0.27504533 0.27504533 [7,] 0.17231433 0.17231433 0.17231433 0.17231433 0.17231433 0.17231433 0.00000000 0.00000000 [8,] 0.26388600 0.26388600 0.26388600 0.26388600 0.26388600 0.26388600 0.26388600 0.26388600 [9,] 0.29493833 0.29493833 0.29493833 0.29493833 0.29493833 0.29493833 0.29493833 0.29493833 [10,] 0.43586333 0.43586333 0.43586333 0.43586333 0.43586333 0.43586333 0.43586333 0.43586333 [11,] 0.43709100 0.43709100 0.43709100 0.43709100 0.43709100 0.43709100 0.43709100 0.43709100 [12,] 0.05807733 0.05807733 0.05807733 0.05807733 0.05807733 0.05807733 0.00000000 0.00000000 [13,] 0.05791267 0.05791267 0.05791267 0.05791267 0.05791267 0.05791267 0.05791267 0.05791267 [14,] 0.17091467 0.17091467 0.17091467 0.17091467 0.17091467 0.17091467 0.17091467 0.17091467 [15,] 0.49726233 0.49726233 0.49726233 0.49726233 0.49726233 0.49726233 0.49726233 0.49726233 [16,] 0.38707000 0.38707000 0.38707000 0.38707000 0.38707000 0.38707000 0.38707000 0.38707000 [17,] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 [18,] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 [19,] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000

And now the diffMap mirrors peptideMap object, and there are no missing peptides. The impact on the HOIP dataset is minimal.

Now that I'm satisfied that the function is doing what it's supposed to, I still don't like the output. Where the two uptake curves have converged, the sign is determined by noise, even if there is a significant differential effect at a different timepoint. And you're never going to be in an optimal place whichever timepoint you use to make the comparison.

I had thought that perhaps what might be better is to take the mean uptake across all timepoints for reference and comparator states (defined by new options ref_cols and comp_cols), and then take the difference on these instead. Maybe we could integrate the area under the uptake curve too as an alternative, but just taking a mean seemed to work well enough. Using this function, I'm a lot happier with how my differential heatmap looks compared to the hdxdifference function in the package.

##' @rdname functional-plots
hdxdifference2 <- function(object, 
                          AAString, 
                          peptideSeqs,
                          numlines = 5,
                          maxmismatch = 0,
                          by = 5,
                          ref_cols = c(1:3),
                          comp_cols = c(4:6),
                          scores = NULL,
                          name = "-log10 p value"){

    # Test
    #stopifnot("AAString must be an object of class AAString"= class(AAString) == "AAString")
    #stopifnot("peptideSeqs must be a character vector"= is.character(peptideSeqs) == "TRUE")
    #stopifnot("cols must be length 2"=length(cols) == 2)

    # Storage and global variables
    plot.list <- list()
    coverage <- matrix(0, ncol = length(AAString), nrow = 1)
    n <- ceiling(length(coverage)/numlines)
    colnames(coverage) <- strsplit(as.character(AAString), "")[[1]]

    # Compute AA stringset and match to dictionary
    peptideset <- AAStringSet(x = peptideSeqs)
    allPatterns <- matchPDict(pdict = peptideset,
                              subject = AAString,
                              max.mismatch = maxmismatch)  

    # Compute coverage numbers
    for (i in seq_along(allPatterns)) {

        begin <- allPatterns[[i]]@start[1]
        end <- allPatterns[[i]]@start[1] - 1 + allPatterns[[i]]@width[1]
        coverage[, seq.int(begin, end)] <- coverage[, seq.int(begin, end)] + 1
    }

    ncov <- max(coverage)

    start <- sapply(allPatterns, function(x) x@start) + 2
    end <- start + sapply(allPatterns, function(x) x@width) - 3

    peptideMap <- matrix(0, ncol = length(AAString), nrow = ncov + 3)
    diffMap <- matrix(0, ncol = length(AAString), nrow = ncov + 3)
    colnames(peptideMap) <- strsplit(as.character(AAString), "")[[1]]
    rownames(peptideMap) <- seq.int(nrow(peptideMap))

    ## compute differences
    diff <- rowSums(assay(object)[,comp_cols], na.rm = TRUE)/length(comp_cols) - rowSums(assay(object)[,ref_cols], na.rm = TRUE)/length(comp_cols)

    # compute diff map
    for (i in seq_along(start)){

            if (i == 1) {
                diffMap[1, seq.int(start[i], end[i])] <- diff[i]
            } else {
                j <- which.min(rowSums(diffMap[, seq.int(start[i], end[i])] != 0))
                diffMap[j, seq.int(start[i], end[i])] <- diff[i]
            }
    }

    # compute p-value map
    if (is.null(scores) == TRUE){
        for (i in seq_along(start)){

            if (i == 1) {
                peptideMap[1, seq.int(start[i], end[i])] <- 1
            } else {
                j <- which.min(rowSums(peptideMap[, seq.int(start[i], end[i])] > 0))
                peptideMap[j, seq.int(start[i], end[i])] <- 1
            }
        }
    } else {
        for (i in seq_along(start)){

            if (i == 1) {
                peptideMap[1, seq.int(start[i], end[i])] <- scores[i]
            } else {
                j <- which.min(rowSums(peptideMap[, seq.int(start[i], end[i])] > 0))
                peptideMap[j, seq.int(start[i], end[i])] <- scores[i]
            }
        }
    }
    peptideMap[peptideMap == 0] <- NA
    averageMap <- apply(peptideMap, 2, function(x) 1/mean(1/x, na.rm = TRUE))
    averageMap[is.nan(averageMap)] <- 1
    averageMap <- -log10(averageMap)
    averageMap <- t(as.matrix(averageMap))
    rownames(averageMap) <- paste0(seq.int(nrow(averageMap)))

    diffMap[diffMap == 0] <- NA
    diffMap <- apply(diffMap, 2, function(x) mean(x, na.rm = TRUE))
    diffMap[is.nan(diffMap)] <- 0
    diffMap <- t(as.matrix(diffMap))
    rownames(diffMap) <- paste0(seq.int(nrow(diffMap)))

    return(list(averageMap = averageMap, diffMap = diffMap))
}   

Hopefully that covers the issue. Wasn't sure whether perhaps there is a better way of proposing changes to push in GitHub.

ococrook commented 1 year ago

Hi Nathan,

That's a great suggestion and thanks for putting in the work to make the edits! I'll try and integrate this into the package - maybe I can also add the auc method you suggestion at the same time.

ococrook commented 1 year ago

now included