ococrook / RexMS

Residue-level analysis of HDX-MS
https://ococrook.github.io/RexMS/
6 stars 0 forks source link

[Feature Request] Differing number of replicates in datasets #3

Open nlgittens opened 2 months ago

nlgittens commented 2 months ago

Issue: ReX can only handle datasets in which there is identical number of replicates across timepoints (perhaps across states also?).

This may be quite a common issue as there can be missed timepoints in certain datasets; different number of non-deuterated experiments; different number of replicates between states, which cannot be handled here.

Might be something to do with matrix being defined by number of timepoints and number of replicates, rather than by distinct experiments? It seems to be an error in error_prediction function, but can imagine there may also be other implications across different functions too as we also defined number of timepoints elsewhere.

Reproducible example:

data("BRD4_apo")

#filter data so 0 s only contains 2 replicates; other timepoints contain 3 replicates
BRD4_apo <- BRD4_apo %>%
  filter(!(Exposure == 0 & replicate == 3))

BRD4_apo <- DataFrame(BRD4_apo)
BRD4_apo <- cleanHDX(res = BRD4_apo, clean = TRUE)
BRD4_apo <- data.frame(BRD4_apo) %>% filter(End < 100)
BRD4_apo <- DataFrame(BRD4_apo)

numTimepoints <- length(unique(BRD4_apo$Exposure))
Timepoints <- unique(BRD4_apo$Exposure)
numPeptides <- length(unique(BRD4_apo$Sequence))
set.seed(1)
rex_test <- rex(HdxData = BRD4_apo,
                  numIter = 100,
                  R = max(BRD4_apo$End), 
                  density = "laplace",
                  numtimepoints = numTimepoints,
                  timepoints = Timepoints,
                  seed = 1L,
                  tCoef = c(0, rep(1, numTimepoints - 1)),
                  phi = 1,
                  BPPARAM = SerialParam())

Warning: 'package:stats' may not be available when loadingWarning: 'package:stats' may not be available when loadingFold 1 ... Fold 2 ... Fold 3 ... Fold 4 ... Fold 5 ... Warning in res$Uptake[res$Sequence == unique(res$Sequence)[j]] - rep(mu, : longer object length is not a multiple of shorter object length

Fold 1 ... Fold 2 ... Fold 3 ... Fold 4 ... Fold 5 ... Warning in res$Uptake[res$Sequence == unique(res$Sequence)[j]] - rep(mu, : longer object length is not a multiple of shorter object length

Error: BiocParallel errors 2 remote errors, element index: 1, 2 0 unevaluated and other errors first remote error: Error in .sd[j, ] <- rep(tCoef numExch[[j]] sqrt(sigmasq), each = numRep): number of items to replace is not a multiple of replacement length

ococrook commented 2 months ago

Thanks Nathan, I had this one on my list. It's more of an enchancement than a bug. There are two ways to deal with this:

1) Impute them 2) model them

Modelling them is quite computationally intensive but if there's lots of imputation then that can cause bias. I suggest I write a simpel imputation script that has a warning if there are lots of missing values?

ococrook commented 2 months ago

an example dataset might be useful!