ococrook / hdxstats

hdxstats: An R-package for statistical analysis of hydrogen deuterium exchange mass-spectrometry data.
Apache License 2.0
6 stars 2 forks source link

FitUptakeKinetics - error messages #23

Closed nlgittens closed 2 years ago

nlgittens commented 2 years ago

I've started looking at some more sizeable datasets now and have come across the following issue. Unfortunately as you can see, the error message isn't particularly informative, so I'm not sure what I'm looking for. The modelling for each peptide using fitUptakeKinetics abruptly terminates with an error message, although I don't have any insight into which peptide R failed at.

From what I have learned, there may be a few different reasons why this happen, on being that #D > 0; when D = 0 for one peptide I was getting this message. It has occurred again since, although this time I'm not quite sure why and still trying to troubleshoot this. image

ococrook commented 2 years ago

If it's possible to share some data that would be great. It could be what you suggest which means the initialisation of d might be bad. What happens if you remove those peptides from the analysis?

nlgittens commented 2 years ago

Shared a peptide in the FileShare where I get this error. In Row 20, D = 0 for this experiment. If I change to 0.1, this error goes away. If I change it to something unrealistic like D = 2, I get the error again. So I think this is happening when fitUptakeKinetics still can't find a good fit even with the retries (this is with the latest version installed). Since just trawling through some of my later eptides, the stop error occurs every time I encounter a peptide with the warning "could not fit model..."

ococrook commented 2 years ago

The example you sent worked for me - it simply returned a HdxStatModel which was empty but this doesn't stop the fitting. The "suingular graident matrix at initial parameter estimates" will appear as it tries different values so you shouldn't worry about that error.

The error that's resulting in the failure is Error: $ ...

could try typing traceback() after you see the error it might give me more clues

nlgittens commented 2 years ago

traceback() 15: deviance.default(X[[i]], ...) 14: FUN(X[[i]], ...) 13: FUN(X[[i]], ...) 12: lapply(X = X, FUN = FUN, ...) 11: sapply(nonlin_mod, deviance) 10: differentialUptakeKinetics(object = object, feature = x, start = start, formula = formula, design = design, maxAttempts = maxAttempts) 9: FUN(X[[i]], ...) 8: lapply(feature, function(x) differentialUptakeKinetics(object = object, feature = x, start = start, formula = formula, design = design, maxAttempts = maxAttempts)) 7: .local(object, ...) 6: fitUptakeKinetics(object = hdxstats_tableqDF[, c(1:num_states)], feature = rownames(hdxstats_tableqDF[, c(1:num_states)])[[1]], start = list(a = NULL, b = NULL, d = NULL, p = 1)) 5: fitUptakeKinetics(object = hdxstats_tableqDF[, c(1:num_states)], feature = rownames(hdxstats_tableqDF[, c(1:num_states)])[[1]], start = list(a = NULL, b = NULL, d = NULL, p = 1)) at hdxstats_use_v3_generic_forOli.R#90 4: eval(ei, envir) 3: eval(ei, envir) 2: withVisible(eval(ei, envir)) 1: source("~/ELN43793/Oli_ManhattenPlot_v2/hdxstats_use_v3_generic_forOli.R")

ococrook commented 2 years ago

awesome, could you paste the output of:

hdxstats::differentialUptakeKinetics
nlgittens commented 2 years ago

Same syntax as in the fitting? I get the same outcome.

`res <- differentialUptakeKinetics(object = hdxstats_tableqDF[,c(1:num_states)],

Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates Error: $ operator is invalid for atomic vectors

traceback() 6: deviance.default(X[[i]], ...) 5: FUN(X[[i]], ...) 4: FUN(X[[i]], ...) 3: lapply(X = X, FUN = FUN, ...) 2: sapply(nonlin_mod, deviance) 1: differentialUptakeKinetics(object = hdxstats_tableqDF[, c(1:num_states)], feature = rownames(hdxstats_tableqDF[, c(1:num_states)])[[1]], start = list(a = NULL, b = NULL, d = NULL, p = 1))

ococrook commented 2 years ago

Literally paste the function name into the terminal. I want see what code is outputed because I see:

2: sapply(nonlin_mod, deviance)

which suggests the package isn't upto date beacuse it should return

sapply(nonlin_mod, function(x) try(deviance(x), silent = TRUE))

which I think would return an error but not allow the code to exit

You may want to try updating the package?

nlgittens commented 2 years ago

okay, sorry

hdxstats::differentialUptakeKinetics function (object, feature = NULL, design = NULL, formula = NULL, start = list(a = NULL, b = 0.001, d = NULL, p = 1), mycolours = brewer.pal(n = 8, name = "Set2"), maxAttempts = 5) { .out <- NULL if (!is.null(design)) { colnames(object)[[1]] <- design } ldf <- longFormat(object) ldf$timepoint <- as.numeric(str_match(ldf$colname, "X\s(.?)\srep")[, 2]) ldf$replicates <- as.numeric(str_match(ldf$colname, "rep\s(.)\scond")[, 2]) ldf$chargestate <- as.numeric(strmatch(ldf$rowname, "\s(.)")[, 2]) ldf$condition <- as.factor(str_match(ldf$colname, "cond\s(.)")[, 2]) if (length(feature) == 1) { if (!is.null(feature)) { if (is.null(formula)) { formula <- value ~ a (1 - exp(-b (timepoint)^p)) + d } data <- data.frame(ldf[ldf$rowname == feature, ]) data <- data[, !colSums(is.na(data)) == nrow(data)] data <- na.omit(data) if (nrow(data) < 3) { print("too few data points to fit model") return() } if (is.null(start$a)) { start$a <- max(data$value) } if (is.null(start$d) & length(start) > 2) { start$d <- min(data$value) } nonlin_mod <- vector(mode = "list", length = maxAttempts) for (j in seq_len(maxAttempts)) { if (is.null(start$b)) { start$b <- 10^(j - maxAttempts) } nonlin_mod[[j]] <- try(nlsLM(data = data, formula = formula, start = start, control = nls.lm.control(maxiter = 500, ftol = 10^{ -8 }), trace = FALSE, lower = rep(0, length(start)), algorithm = "LM", na.action = na.exclude)) start$b <- NULL } jj <- which.min(sapply(nonlin_mod, deviance)) if (is.null(start$b)) { start$b <- 10^{ jj - maxAttempts } } nonlin_mod <- nonlin_mod[[jj]] if (inherits(nonlin_mod, "try-error")) { print("model fit failed, likely exessive missing values") return(nonlin_mod) } myPredict1 <- expand.grid(timepoint = seq(min(data$timepoint, 0), max(data$timepoint), 0.5)) myPredict1$fit <- predict(nonlin_mod, newdata = myPredict1) datalist <- group_split(data, condition) nlmod <- lapply(datalist, function(x) { nonlin_mod <- try(nlsLM(data = x, formula = formula, start = start, control = nls.lm.control(maxiter = 500, ftol = 10^{ -8 }), trace = FALSE, lower = rep(0, length(start)), algorithm = "LM", na.action = na.exclude)) }) if (any(sapply(nlmod, function(x) inherits(x, "try-error")))) { print("Could not fit model, likely exessive missing values") return(nlmod) } myPredict <- lapply(1:length(nlmod), function(j) expand.grid(timepoint = seq(min(data$timepoint, 0), max(data$timepoint), 0.5))) myPredict <- lapply(1:length(nlmod), function(j) { myPredict[[j]] <- predict(nlmod[[j]], newdata = myPredict[[j]]) return(myPredict[[j]]) }) myPredictjoin <- do.call(cbind, myPredict) colnames(myPredictjoin) <- sapply(datalist, function(x) x$condition[1]) myPredictjoin <- data.frame(myPredictjoin) myPredictjoin$timepoint <- unlist(expand.grid(timepoint = seq(min(data$timepoint, 0), max(data$timepoint), 0.5))) myPredict_long <- myPredictjoin %>% pivot_longer(cols = 1:length(datalist)) colnames(myPredict_long) <- c("timepoint", "condition", "fit") df <- data.frame(myPredict_long) df$condition <- gsub("X", "", df$condition) gg <- ggplot(data, aes(y = value, x = timepoint, color = condition)) + geom_point(size = 4) + theme_classic() + geom_line(data = myPredict1, aes(x = timepoint, y = fit), inherit.aes = FALSE, size = 2, col = mycolours[2]) + geom_line(data = df, aes(x = timepoint, y = fit, color = condition), inherit.aes = FALSE, size = 2) + labs(x = "Exposure", y = "Deuterium Incoperation", color = "condition", title = paste0(data$rowname[1])) + scale_color_viridis(alpha = 0.7, discrete = TRUE) .nlmod <- .nlsList(nlsmodels = nlmod) .out <- .hdxstatmodel(nullmodel = nonlin_mod, alternative = .nlmod, vis = gg, method = "Functional Model", formula = formula) } } return(.out) } <bytecode: 0x3db3b368>

nlgittens commented 2 years ago

hdxstats::manhattanplot appears with the correct spelling, so I think I do have the most recent version

ococrook commented 2 years ago

No apologies needed, I wasn't being clear. Thank you!

I think you might need to do another update. The line generating the error is slightly different in the current version:

https://github.com/ococrook/hdxstats/blob/c82466a05d1d5f0c2527fd1344ebebb043067557/R/functionalAnalysis-function.R#L87

nlgittens commented 2 years ago

Okay, I can see the correct line for the current version, and was able to process that single peptide data without a stop error. I just mustn't have had the newest version. I'll close the issue then, thanks!

ococrook commented 2 years ago

Thanks, @nlgittens