philips-software / latrend

An R package for clustering longitudinal datasets in a standardized way, providing interfaces to various R packages for longitudinal clustering, and facilitating the rapid implementation and evaluation of new methods
https://philips-software.github.io/latrend/
GNU General Public License v2.0
28 stars 5 forks source link

Tools for tuning k-cluster to improve the model #104

Closed Leprechault closed 2 years ago

Leprechault commented 2 years ago

Hi Everyone!!

I'd like to find the optimal k values clusters for my time series. Is there any tool to perform hyperparameter tuning for spatio-temporal k-means clustering?

Thanks in advance!

niekdt commented 2 years ago

Thanks for your question. The package currently only supports univariate longitudinal clustering, so I'm not sure how applicable it is to a spatio-temporal setting.

Here's an example of how to specify a KmL method, to define it for 1 to 5 clusters, estimate the list of definitions, and then plot the metric to identify the desirable number of clusters.

library(latrend)
data(latrendData)
# define KmL
method = lcMethodKML(response = 'Y')
methods = lcMethods(method, nClusters = 1:5)
# fit the specified methods
models = latrendBatch(methods, data = latrendData)

plotMetric(models, 'RSS')

# select best model by minimizing the criterion (not recommended)
bestModel = min(models, 'RSS')

# preferably, assess and select the best model manually
bestModel = models[[2]]
# or
bestModel = subset(models, nClusters == 2, drop = TRUE)

plot(bestModel)

For more robust hyperparameter tuning (e.g., when your sample size is small), you may want to consider cross-validation. This can be done through latrendCV, but requires a bit more coding as it's not a well-defined process yet.

I'm open to any suggestions.

Leprechault commented 2 years ago

Thanks for your question. The package currently only supports univariate longitudinal clustering, so I'm not sure how applicable it is to a spatio-temporal setting.

Here's an example of how to specify a KmL method, to define it for 1 to 5 clusters, estimate the list of definitions, and then plot the metric to identify the desirable number of clusters.

library(latrend)
data(latrendData)
# define KmL
method = lcMethodKML(response = 'Y')
methods = lcMethods(method, nClusters = 1:5)
# fit the specified methods
models = latrendBatch(methods, data = latrendData)

plotMetric(models, 'RSS')

# select best model by minimizing the criterion (not recommended)
bestModel = min(models, 'RSS')

# preferably, assess and select the best model manually
bestModel = models[[2]]
# or
bestModel = subset(models, nClusters == 2, drop = TRUE)

plot(bestModel)

For more robust hyperparameter tuning (e.g., when your sample size is small), you may want to consider cross-validation. This can be done through latrendCV, but requires a bit more coding as it's not a well-defined process yet.

I'm open to any suggestions.

Thanks very much @niekdt it is exactly what I need!!! I will work in a cross-validation approach and try something in mlr package.