philips-software / latrend

An R package for clustering longitudinal datasets in a standardized way, providing interfaces to various R packages for longitudinal clustering, and facilitating the rapid implementation and evaluation of new methods
https://philips-software.github.io/latrend/
GNU General Public License v2.0
28 stars 5 forks source link

Implement gridsearch from lcmm? #126

Closed knokknok closed 1 year ago

niekdt commented 1 year ago

Thanks for the suggestion. I will implement this in the coming weeks

Note that for repeated fitting with automatic selection of the best result you can already use the lcFitRepMin method modifier. Still, the gridsearch approach will be faster since it does an initial quick exploration for different random starts.

library(latrend)
data(latrendData)
method <- lcMethodLcmmGMM(
    fixed = Y ~ Time,
    mixture = ~ Time,
    random = ~ 1,
    id = "Id",
    time = "Time", ,
    nClusters = 2
)

# fit the method but only get the best result out of 10 fits
gmm <- latrend(
  lcFitRepMin(method, rep = 10, metric = 'BIC'),
  data = latrendData
)
niekdt commented 1 year ago

@knokknok I've implemented gridsearch initialization, including parallel computation support. See the lcmethodLcmmGMM documentation for details.

Example specification:

  method <- lcMethodLcmmGMM(
    fixed = Y ~ Time,
    mixture = ~ Time,
    random = ~ 1,
    id = "Id",
    time = "Time",
    nClusters = 3,
    init = "gridsearch",
    gridsearch.maxiter = 10,
    gridsearch.rep = 50,
    gridsearch.parallel = TRUE
  )

Let me know if you run into any issues.

knokknok commented 1 year ago

Thanks! Out of curiosity, why did you reimplement the grid search functionality instead of calling lcmm's?

niekdt commented 1 year ago

Mainly because I already had a custom implementation that I created 4 years ago for a simulation study (see here), which was the basis for creating this package.

I (probably) had a good reason for it at the time but which I don't quite remember haha. Most likely an issue with the hlme call evaluation outside of the global environment, or the annoyance of (re)constructing that call dynamically.

Anyways, using the same parallel back-end as the rest of the latrend package is a nice bonus :)

knokknok commented 1 year ago

Sorry to reopen but I get an error when the number of clusters includes 1 in lcMethods (I think it is the trigger):

Error in { :
  task 1 failed - "hasName(x = envir, name = "lme") is not TRUE"
niekdt commented 1 year ago

Thanks, I've added this as a test case. Gridsearch init is now ignored for nClusters = 1 so it's easier to estimate lcmm for 1:N clusters.

niekdt commented 1 year ago

Unfortunately I submitted the 1.5.1 CRAN release a couple of hours ago so this fix will not be pushed to CRAN until 2 weeks later at minimum.

niekdt commented 1 year ago

Since there was a problem with the CRAN submission this fix will be on the 1.5.1 CRAN release.