jasp-stats / jasp-issues

This repository is solely meant for reporting of bugs, feature requests and other issues in JASP.
58 stars 29 forks source link

[Feature Request]: Saving posterior probabilities for clustering procedures #2959

Open DJoubert1971 opened 1 week ago

DJoubert1971 commented 1 week ago

Description

It would be useful to be able to save posterior probabilities for all cluster solutions, since they are often more precise than cluster assignment

Purpose

Precision

Use-case

No response

Is your feature request related to a problem?

Feature is missing from the modules

Is your feature request related to a JASP module?

Machine Learning

Describe the solution you would like

Make saving posterior probabilities for the clusters available

Describe alternatives that you have considered

No response

Additional context

No response

tomtomme commented 4 days ago

@DJoubert1971 thx for the request

@koenderks is this feasible?

koenderks commented 4 days ago

@DJoubert1971 Which algorithms currently implemented in JASP provide such posterior probabilities? I think that not all clustering algorithms give a probabilistic outcome?

DJoubert1971 commented 4 days ago

Hello,

Many of them do although it’s not always obvious from the manual or vignettes. It makes sense that they would as simple cluster assignment is imperfect and comes with noise or uncertainty, even in the "hard" partitioning methods. It’s definitely a must for the fuzzy or other soft methods. Sometimes packages refer to a U matrix, sometimes z (mclust) sometimes posterior. If you give me a list of packages used in jasp for clustering purposes I can get the information. It’s usually included in the values returned for functions. An alternative could be to use the jasp functions for clustering and try to get the posterior matrix from running code from the package within jasp, but I don’t think this is allowed yet.

Thanks,

David J

Get Outlook for iOShttps://aka.ms/o0ukef


From: Koen Derks @.> Sent: Monday, October 14, 2024 5:17:29 AM To: jasp-stats/jasp-issues @.> Cc: David Joubert @.>; Mention @.> Subject: Re: [jasp-stats/jasp-issues] [Feature Request]: Saving posterior probabilities for clustering procedures (Issue #2959)

Attention : courriel externe | external email

@DJoubert1971https://github.com/DJoubert1971 Which algorithms currently implemented in JASP provide such posterior probabilities? I think that not all clustering algorithms give a probabilistic outcome?

— Reply to this email directly, view it on GitHubhttps://github.com/jasp-stats/jasp-issues/issues/2959#issuecomment-2410547983, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BL7DILVYOIUZECBPDI67RPLZ3OD2TAVCNFSM6AAAAABPVDUIX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJQGU2DOOJYGM. You are receiving this because you were mentioned.Message ID: @.***>

koenderks commented 4 days ago

It's easiest if the packages themselves return a matrix of probabilities.

randomForest (for random foresT)
mclust (for model-based)
cluster (for k-means etc.)
stats (for hiearchical)
e1071 (for fuzzy c means)
dbscan (for density-based)
DJoubert1971 commented 4 days ago

For RandomForest: use the predict() function while keep.forest flag set to TRUE For mclust: extract the z matrix (e.g., mod1$z) For cluster: did not find anything for this one... For stats : did not find anything For e1071: Can use the predict () function and set probability to TRUE, then use as.matrix on the result For dbscan: there is a function called membership_prob that seems to correspond to posteriors

Thanks,

D

David Joubert, Ph.D. Associate Professor Department of Criminology University of Ottawa Faculty of Social Sciences 125 University @.*** (613) 562-5800<tel:(613)%20562-5800> x1803


From: Koen Derks @.> Sent: October 14, 2024 12:16 PM To: jasp-stats/jasp-issues @.> Cc: David Joubert @.>; Mention @.> Subject: Re: [jasp-stats/jasp-issues] [Feature Request]: Saving posterior probabilities for clustering procedures (Issue #2959)

Attention : courriel externe | external email

It's easiest if the packages themselves return a matrix of probabilities.

randomForest (for random foresT) mclust (for model-based) cluster (for k-means etc.) stats (for hiearchical) e1071 (for fuzzy c means) dbscan (for density-based)

— Reply to this email directly, view it on GitHubhttps://github.com/jasp-stats/jasp-issues/issues/2959#issuecomment-2411704322, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BL7DILXEFDN3RM2RQVRENELZ3PU7TAVCNFSM6AAAAABPVDUIX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJRG4YDIMZSGI. You are receiving this because you were mentioned.Message ID: @.***>