[BUG] No residue-resolved model for first 100 residues?

nlgittens commented 5 months ago

Got this strange bug I haven't seen elsewhere in which I was modelling some data in Rex and the first 101 residues to be exact, posterior estimates are identical and so giving no resolution in any of the modelled uptakes. As can be seen from the coverage plot, there is quite good redundancy of peptides in this region with a broad range of dynamics, and the rest of the peptides in the protein seem to be modelled okay.

ococrook commented 5 months ago

This does seem odd - given the rest looks really good! what happens if you just model these first 100 residues on their own? Can you show me the function you ran. One possibility is that you set the parameter R_lower to be 100?

nlgittens commented 5 months ago

So I did two different modelling in the same R session to ensure that the initial inputs were identical. In one case, in took the first 100 residues, and in the other the whole protein. For the full-length protein, it is as before:

In the case of the first 100 residues, they are modelled taking into account the peptide redundancy:

I fixed R equal to the last residue in the dataframe in each case; this is what I did:

hdx_apo <- data.frame(hdx_clean) %>% filter(State == "apo")
hdx_comparator <- data.frame(hdx_clean) %>% filter(State == "State1")

hdx_apo_100 <- DataFrame(hdx_apo %>% filter(End < 100))
hdx_comparator_100 <- DataFrame(hdx_comparator %>% filter(End < 100))

hdx_apo <- DataFrame(hdx_apo)
hdx_comparator <- DataFrame(hdx_comparator)

numTimepoints <- length(unique(hdx_apo$Exposure))
Timepoints <- unique(hdx_apo$Exposure)
numPeptides <- length(unique(paste(hdx_apo$Sequence, hdx_apo$Charge, sep = "_")))
numPeptides_100 <- length(unique(paste(hdx_apo_100$Sequence, hdx_apo_100$Charge, sep = "_")))

set.seed(1)
rex_100 <- rex(HdxData = DataFrame(hdx_comparator_100),
                numIter = 1000,
                R = max(hdx_comparator_100$End), 
                numtimepoints = numTimepoints,
                timepoints = Timepoints,
                density = "laplace",
                seed = 1L,
                tCoef = c(0, rep(1, numTimepoints - 1)),
                BPPARAM = bpparam())

rex_100 <- RexProcess(HdxData = DataFrame(hdx_comparator_100), 
                       params = rex_100,
                       range = 1:1000,
                       thin = 1,
                       whichChains = c(1,2))

saveRDS(rex_100, "rex_100.rds")

set.seed(1)
rex_all <- rex(HdxData = DataFrame(hdx_comparator),
               numIter = 1000,
               R = max(hdx_comparator$End), 
               numtimepoints = numTimepoints,
               timepoints = Timepoints,
               density = "laplace",
               seed = 1L,
               tCoef = c(0, rep(1, numTimepoints - 1)),
               BPPARAM = bpparam())

rex_all <- RexProcess(HdxData = DataFrame(hdx_comparator), 
                      params = rex_all,
                      range = 1:1000,
                      thin = 1,
                      whichChains = c(1,2))

saveRDS(rex_all, "rex_all.rds")

ococrook commented 5 months ago

Very strange, I think a couple options to try, I suspect something in the defaults causing this. Could you try the following:

1) See if UptakeGuess function gives you any resolution, here's an example

require(RexMS)
require(dplyr)
data(BRD4_apo)

BRD4_apo <- BRD4_apo %>% filter(End < 100)
numTimepoints <- length(unique(BRD4_apo$Exposure))
Timepoints <- unique(BRD4_apo$Exposure)
numPeptides <- length(unique(BRD4_apo$Sequence))
BRD4_apo <- DataFrame(BRD4_apo)
C <- coverageHeatmap(res = BRD4_apo, plot = FALSE)

uptakeGuess(BRD4_apo,
            numRep = 3,
            numPeptides =  numPeptides,
            numtimepoints =  numTimepoints,
            R =  99,
            C = C,
            phi = 0.92)

2) Run for more iterations? Might be good to see what the sigma plot looks like

3) In the prior list(lambda = 100/(R - 1), meanlog = -3, sdlog = 1, rho = 0.5, shape1 = 1, shape2 = 5, shape = 1, b_alpha = 1, b_beta = 200, dshape = 1, d_alpha = 1, d_beta = 1, sigma_sd = 0.5, pishape1 = 1, pishape2 = 10)

set lambda to be much larger:

prior <- list(lambda = 100, meanlog = -3, sdlog = 1, rho = 0.5, shape1 = 1,
    shape2 = 5, shape = 1, b_alpha = 1, b_beta = 200, dshape = 1, d_alpha = 1, d_beta =
    1, sigma_sd = 0.5, pishape1 = 1, pishape2 = 10)

and use this instead?

ococrook / RexMS

[BUG] No residue-resolved model for first 100 residues? #5