philips-software / latrend

An R package for clustering longitudinal datasets in a standardized way, providing interfaces to various R packages for longitudinal clustering, and facilitating the rapid implementation and evaluation of new methods
https://philips-software.github.io/latrend/
GNU General Public License v2.0
28 stars 5 forks source link

Inquiry: on model specification #156

Open nangosyah opened 6 months ago

nangosyah commented 6 months ago

Hi I wanted to inquire on the specification of the models for this package, I'm trying to implement the Bayesian model from the MixAK extension "lcMethodMixAK_GLMM"

my data: HDX(deuterium exchange values) - continuous, Peptide (peptide id) - continuous , condition (factor variable), SampleID (factor variable), Time (factor variable)

When I fit the model this way:

mixAKMethod <- lcMethodMixAK_GLMM(fixed = HDX_transformed ~ Time, random = ~SampleID, id = "Peptide", time = "Time", nClusters = 2)

mixAK <- latrend(mixAKMethod, data = sc9)

why do I get this type of error, is the specification wrong ?

Error in str2lang(x) : :1:63: unexpected numeric constant 1: y ~ 1 + Time0 + Time10 + Time60 + Time3600 + SampleIDSample 1 ^ 11. str2lang(x) 10. formula.character(paste("y ~ 1 +", paste(colnames(x[[s]]), collapse = " + "), " + ", paste(colnames(z[[s]]), collapse = " + "), " + (1 +", paste(colnames(z[[s]]), collapse = " + "), " | id)")) 9. formula(paste("y ~ 1 +", paste(colnames(x[[s]]), collapse = " + "), " + ", paste(colnames(z[[s]]), collapse = " + "), " + (1 +", paste(colnames(z[[s]]), collapse = " + "), " | id)")) 8. GLMM_MCMCifit(do.init = TRUE, na.complete = FALSE, y = dd$y, dist = dd$dist, id = dd$id, time = dd$time, x = dd$x, z = dd$z, random.intercept = dd$random.intercept, xempty = dd$xempty, zempty = dd$zempty, Rc = dd$Rc, Rd = dd$Rd, p = dd$p, p_fi = dd$p_fi, ... 7. (function (y, dist = "gaussian", id, x, z, random.intercept, prior.alpha, init.alpha, init2.alpha, scale.b, prior.b, init.b, init2.b, prior.eps, init.eps, init2.eps, nMCMC = c(burn = 10, keep = 10, thin = 1, info = 10), tuneMCMC = list(alpha = 1, ... 6. do.call(mixAK::GLMM_MCMC, args) 5. fit(method = method, data = data, envir = modelEnv, verbose = verbose) 4. fit(method = method, data = data, envir = modelEnv, verbose = verbose) 3. suppressFun({ modelEnv = preFit(method = method, data = data, envir = envir, verbose = verbose) model = fit(method = method, data = data, envir = modelEnv, ... 2. .fitLatrendMethod(cmethod, modelData, envir = modelEnv, mc = mc, verbose = verbose) 1. latrend(mixAKMethod, data = sc9)

niekdt commented 6 months ago

hi, I think the problem is that your Time column is a factor variable. It should be numeric. If that is indeed the case, I need to add automatic checks to latrend to catch this.

You have two options, depending on what makes sense for your analysis:

nangosyah commented 6 months ago

Hi thank you for your timely feedback, I have tried playing around with the data types in my dataset changing them to either numeric or factor variables as suggested.

I intended to fit the model below:

HDX = Time + condition + Time*condition with random effect for the sample.

I have realised that if I have a combination of both factor variables and numeric variables the package doesn't seem to like it and will return the error below:

 ----------------------------------------------------------------------
 - Longitudinal clustering using: generalized linear mixed model with normal random effects mixture
 ----------------------------------------------------------------------
 Method arguments:
  time:           "Time"
  id:             "Peptide"
  nClusters:      3
  dist:           "gaussian"
  nMCMC:          c(burn = 10, keep = 10, thin = 1, info =
  tuneMCMC:       list(alpha = 1, b = 1)
  store:          c(b = FALSE)
  PED:            TRUE
  keep.chains:    TRUE
  dens.zero:      9.99999999999999e-301
  parallel:       FALSE
  fixed:          HDX_transformed ~ Time
  random:         ~SampleID
 ----------------------------------------------------------------------
 Checking and transforming the training data format.
 Preparing the training data for fitting...
 Fitting the method...

Error in data.frame(Est = lme4::fixef(ifit)[iRAND], SE = sqrt(diag(as.matrix(vcov(ifit)))[iRAND])) : row names contain missing values

If I then change all the variables in the model to numerical variables it seems to work perfectly and is able to do the clustering, I'm curious if this is how the package is set out to operate and why that could be the case.

Thanks.

niekdt commented 6 months ago

Combining numeric and factor covariates should be possible, since mixAK::GLMM_MCMC uses numeric model matrices. It's latrend that does the automatic factor conversion. This functionality is not well-tested yet unfortunately, as you have experienced.

I'll look into it in the coming days.