philips-software / latrend

An R package for clustering longitudinal datasets in a standardized way, providing interfaces to various R packages for longitudinal clustering, and facilitating the rapid implementation and evaluation of new methods
https://philips-software.github.io/latrend/
GNU General Public License v2.0
28 stars 5 forks source link

Ordering clusters in plotClusterTrajectories() #144

Closed hichew22 closed 10 months ago

hichew22 commented 1 year ago

Hello Niek,

I am using the dtw method in latrend to plot some clusters like so: plotClusterTrajectories(dtw_model_2)

image

or plot(dtw_model_2)

image

Is there a way within plotClusterTrajectories() where I can specify that the cluster with more values is always listed as cluster "A" (highest frequency to lowest frequency)?

Or would I need to reorder them manually, something like:

# Create dataframe with cluster assignments and UPNs
cluster_dtw <- trajectoryAssignments(dtw_model_2)
upn <- ids(dtw_model_2)
df_cluster_dtw <- cbind(cluster_dtw, upn) %>%
  as.data.frame()

# Recode clusters from largest (1) to smallest (2)
cluster_freq <- table(df_cluster_dtw$cluster_dtw)
ordered_clusters <- names(sort(cluster_freq, decreasing = TRUE))
df_cluster_dtw$cluster_dtw <-
  factor(
    df_cluster_dtw$cluster_dtw,
    levels = ordered_clusters,
    labels = seq_along(ordered_clusters)
  ) 

# Create column with cluster labels and percentages
cluster_percentages <- prop.table(table(df_cluster_dtw$cluster_dtw))
cluster_labels = sprintf("%s (%d%%)", names(cluster_percentages), round(cluster_percentages * 100))

df_cluster_dtw <- df_cluster_dtw %>%
  mutate(cluster_label = factor(
    cluster_dtw,
    levels = names(cluster_percentages),
    labels = cluster_labels
  ))

# Add cluster assignments to df_long
df_long <- df_long %>%
  left_join(df_cluster_dtw, by = "id")

# Plot cluster trajectories
latrend::plotClusterTrajectories(
  df_long,
  response = "value",
  cluster = "cluster_label",
  trajectories = TRUE,
  facet = TRUE,
  size = 2
)

?

Thank you!

niekdt commented 1 year ago

Hi @hichew22, thanks for the suggestion. It's a useful feature to have, especially when comparing similar cluster solutions.

I'll start with adding an argument to plotClusterTrajectories for specifying which clusters to plot, and the ordering. Ultimately, having a wrapper lcModel class for which you can specify the ordering logic would be best. But that'll take some more effort.

niekdt commented 10 months ago

plotClusterTrajectories now has a clusterOrder argument allowing you to specify which clusters to plot, and the order thereof. Either by name or index

hichew22 commented 10 months ago

Awesome, thanks, Niek! Do you have an example, and does this also allow ordering clusters from most to least frequent?

niekdt commented 10 months ago

You're welcome!

data(latrendData)
method <- lcMethodLMKM(Y ~ Time, id = "Id", time = "Time")
model <- latrend(method, latrendData, nClusters = 3)

# change cluster order
plotClusterTrajectories(model, clusterOrder = c('B', 'C', 'A'))

# show only specific clusters
plotClusterTrajectories(model, clusterOrder = c('B', 'C'))

It's intended as a quick way to set a custom order, but to dynamically order by cluster size, you can use:

plotClusterTrajectories(model, clusterOrder = order(-clusterSizes(model)))

In the future I intend to add some lcModel wrapper classes that would automatically relabel clusters based on some criterion, so the ordening would then be handled during the latrend fitting procedure.

hichew22 commented 10 months ago

I tried doing that for a 4-cluster DTW model as such: plotClusterTrajectories(dtw_model_4, clusterOrder = order(-clusterSizes(dtw_model_4)))

However, the clusters do not appear ordered. I did make sure to download the most recent installation of latrend. Could you help me with this?

image
niekdt commented 10 months ago

Did you install the latest commit (not release)?

remotes::install_github('philips-software/latrend')

hichew22 commented 10 months ago

Yes, I just did but seems like the function does not work.

─ preparing ‘latrend’: (555ms) ✔ checking DESCRIPTION meta-information ... ─ installing the package to process help pages (811ms) Loading required namespace: latrend ─ saving partial Rd database (3.1s) ─ checking for LF line-endings in source and make files and shell scripts (335ms) ─ checking for empty or unneeded directories ─ building ‘latrend_1.5.1.tar.gz’

niekdt commented 10 months ago

I can't spot any issues in the source code. Could you let me know what the output is of:

latrend:::make.orderedClusterNames(clusterNames(dtw_model_4), order(-clusterSizes(dtw_model_4)))
hichew22 commented 10 months ago

I just tried re-running it and it works now! Perhaps I had to restart my R session. Thank you so much!