jasp-stats / jasp-issues

This repository is solely meant for reporting of bugs, feature requests and other issues in JASP.
58 stars 29 forks source link

[Feature Request]: labels in cluster plots #2763

Open SalvadorDali6 opened 3 months ago

SalvadorDali6 commented 3 months ago

Description

clustering methods show also the labels in the cluster plots including a confusion matrix

Purpose

cluster analysis is unsupervised. however, it would be helpful to add not just the data point labels in the cluster plot but also individual labels from the data (such as healthy, disease). this helps seeing if the clustering performed well based on known groupings.

Use-case

No response

Is your feature request related to a problem?

No response

Is your feature request related to a JASP module?

Machine Learning

Describe the solution you would like

cluster analysis allows saving of predictions in own columns but it would be more convenient to have the option to include the labels already upfront and visualize it directly in the cluster plot and also offer a confusion matrix with the clusters and the labels (if grouping variable is present of course)

Describe alternatives that you have considered

No response

Additional context

No response

tomtomme commented 3 months ago

@SalvadorDali6 thx for the request. Seems like a very reasonable one!

koenderks commented 3 months ago

Is it not better to make a confusion matrix in the contingency tables analysis? Then you would put the clusters in the rows and the actual classification in the columns, which will give you a confusion matrix. That way we keep the functionality neatly organised: clustering functionality the user can find in the clustering analysis and crosstabs functionality the user can find in the contingency tables analysis. I fear that if we have too much overlap it might become confusing for a user.

SalvadorDali6 commented 2 months ago

if there is a function to make a confusion matrix in jasp (i have not checked but i will), then i think that is also an option. you are right, if the data set is too big it might be too much! however, if the groups are just color coded or similar...?