Open cbrautigam2 opened 7 months ago
Tribuo doesn't have an implementation of fitting GMMs. We have a data generator that can sample from them to generate example data, but it can't fit that generator to a dataset. The data generator is roughly analogous to the gmdistribution
function but it's pretty limited in terms of the number of gaussians. Building a more flexible version which has the functionality of gmdistribution
isn't too hard on top of what we provide (e.g. MultivariateNormalDistribution).
Implementing a basic EM algorithm to fit a GMM like fitgmdist
wouldn't be too hard as we have the cholesky factorization which is used in the M step, but making something scalable requires more effort (as our matrix algebra library isn't parallel yet).
I've written a GMM implementation which is currently being debugged. Do you need the gmdistribution
function as applied to only a distribution fit on data, or do you also want to be able to sample from a mixture distribution that you've created by hand?
I would say both, such that you can save off the distributions for later use and can reinflate them to be used again for performing predictions.
-Craig
From: Adam Pocock @.> Sent: Sunday, April 28, 2024 3:03 PM To: oracle/tribuo @.> Cc: Craig Brautigam @.>; Author @.> Subject: [External] - Re: [oracle/tribuo] Gaussian Mixture Model capability (Issue #359)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
I've written a GMM implementation which is currently being debugged. Do you need the gmdistribution function as applied to only a distribution fit on data, or do you also want to be able to sample from a mixture distribution that you've created by hand?
— Reply to this email directly, view it on GitHubhttps://github.com/oracle/tribuo/issues/359#issuecomment-2081654736, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF45DNWDNXTN4JVJNE6Q37LY7VP23AVCNFSM6AAAAABDIZJ7S6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBRGY2TINZTGY. You are receiving this because you authored the thread.Message ID: @.***>
The information contained in this e-mail and any attachments from ICR, Inc. may contain confidential and/or proprietary information, and is intended only for the named recipient to whom it was originally addressed. If you are not the intended recipient, any disclosure, distribution, or copying of this e-mail or its attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by return e-mail and permanently delete the e-mail and any attachments.
Ok. You'll be able to save the model and reuse it for future predictions, but extracting a distribution object like MultivariateNormalDistribution
back out of it will be a little complicated as the dimensions of the samples are based on Tribuo's feature dimensions which are named rather than indexed and getting the index is a little more work. I've thought about it a bit more today and I think I will add a MixtureDistribution
class and try to add a distributions interface, but the sampling method will likely be exposed on both MixtureDistribution
and GaussianMixtureModel
.
Hi,
I need to port some Matlab code to java and I'm looking at what is out there in Java land that can do Gaussian Mixture Models. Specifically, the code that I have to port is making heavy use of Matlab's gmdistribution https://www.mathworks.com/help/stats/gmdistribution.html and fitgmdist https://www.mathworks.com/help/stats/fitgmdist.html. I see that Tribuo alludes to Gaussian Mixtures in the KMeans tutorial: https://tribuo.org/learn/4.3/tutorials/clustering-tribuo-v4.html. So maybe this would suffice? I'm definitely not a mathematician, but I'm trying to see if Tribuo can do GMMs like these Matlab functions. It appears that Matlab supports two covariance types 'full' and diagonal'.
Can you please elaborate on Tribuo's capabilities in regards to GMMs?