Open TarandeepKang opened 4 months ago
Regarding Gower distances: I usually calculate Gower distances and input those into k-medoids, both using the cluster package in R, something like this:
distances <- cluster::daisy(x = df, metric = "gower")
cl<-cluster::pam(x=distances, k=k, diss=TRUE) # k is number of clusters
Oh yes, the daisy function in the cluster package is another way to get at Gower distances! Since you already use other functions from cluster elsewhere, that might be preferable.
Description
No response
Purpose
Improve the range of data types that can be clustered
Use-case
No response
Is your feature request related to a problem?
Currently mixed categorical data cannot be clustered using Jasp
Is your feature request related to a JASP module?
Machine Learning
Describe the solution you would like
k-prototypes clustering and Gower distances
Describe alternatives that you have considered
No response
Additional context
k-prototypes clustering (Huang) using the clustmixtype package as well as perhaps Gower distances (gower package) and I include a few reviews of the wide variety of other methods.
Ahmad, A., & Khan, S. S. (2019). Survey of State-of-the-Art Mixed Data Clustering Algorithms. IEEE Access, 7, 31883–31902. https://doi.org/10.1109/ACCESS.2019.2903568 Gower, J. C. (1971). A General Coefficient of Similarity and Some of Its Properties. Biometrics, 27(4), 857–871. https://doi.org/10.2307/2528823 Huang, Z. (1998). Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery, 2(3), 283–304. https://doi.org/10.1023/A:1009769707641 Hunt, L., & Jorgensen, M. (2011). Clustering mixed data. WIREs Data Mining and Knowledge Discovery, 1(4), 352–361. https://doi.org/10.1002/widm.33 McParland, D., & Gormley, I. C. (2016). Model based clustering for mixed data: clustMD. Advances in Data Analysis and Classification, 10(2), 155–169. https://doi.org/10.1007/s11634-016-0238-x Szepannek, G. (2018). clustMixType: User-Friendly Clustering of Mixed-Type Data in R. The R Journal, 10(2), 200–208. van de Velden, M., Iodice D’Enza, A., & Markos, A. (2019). Distance-based clustering of mixed data. WIREs Computational Statistics, 11(3), e1456. https://doi.org/10.1002/wics.1456