oracle / tribuo

Tribuo - A Java machine learning library
https://tribuo.org
Apache License 2.0
1.24k stars 172 forks source link

Gaussian Mixture Model implementation #369

Open Craigacp opened 1 month ago

Craigacp commented 1 month ago

Description

Adds a Gaussian Mixture Model clustering implementation with spherical, diagonal and full covariance structure using Expectation Maximisation. Also adds a mixture distribution to the RNG library to allow sampling from a user constructed gaussian mixture model (as opposed to one fit to a data distribution).

It also contains new Math functions necessary to implement the GMM efficiently, some updates for K-Means to modernise it a little bit, and some cleanups to the main pom file.

There are a few important fixes to the Math package in here as well, determinants for matrix factorizations were incorrectly computed, and the subtract function on SparseVector was incorrect.

Motivation

GMMs are a useful clustering algorithm. Fixes #359.

Paper reference

Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition.