tidymodels / tidyclust

A tidy unified interface to clustering models
https://tidyclust.tidymodels.org/
Other
108 stars 14 forks source link

sse_within_total() ignores dist_fun if data is present in model #184

Open EmilHvitfeldt opened 4 months ago

EmilHvitfeldt commented 4 months ago

Here it is taken from https://stackoverflow.com/questions/78540316/r-tidyclust-tune-a-k-prototypes-model/78540444#78540444, which hides the improper use of `cluster::daisy()~

library(tidyclust)
library(tidyverse)
library(tidymodels)

data("penguins", package = "modeldata")

penguins <- penguins %>%
  drop_na()

penguins_cv <- vfold_cv(penguins, v = 5)

# spec1 is for a non-tunable model
kmeans_spec1 <- k_means(engine = 'clustMixType', num_clusters = 4)

penguins_rec <- recipe(~ .,
  data = penguins
)

kmeans_wflow1 <- workflow(penguins_rec, kmeans_spec1)

# non tunable clustering fit
kmeans_fit <- fit(kmeans_wflow1, data = penguins)

# this works without errors
sse_within_total(kmeans_fit)

# this also works
sse_within_total(kmeans_fit, dist_fun = cluster::daisy)