Open UnkindPartition opened 8 years ago
Anything explicitly using probability distributions has still been removed because I still haven't had time to flesh out what I think the proper interface should be, and I don't want to spend time maintaining a less-than-perfect interface.
Is there a particular reason you want naive bayes? Usually, logisitic regression will outperform it, and that is currently implemented (although admittedly not super well documented).
Actually, if you're serious about wanting good machine learning results, you'll be better off with a non-Haskell library for now. I should have brought this up earlier, but I generally strongly warn people against using HLearn in its current state. The interface to most learning algorithms is pretty different from the interfaces provided by other libraries; it's still not super well documented; and I can't promise being able to spend much time helping you through the quirks. I'd really only recommend it at this point if you're interested in exploring alternative design spaces of what machine learning systems could look like.
At the moment I'm dealing with multi-class classification. I vaguely know that I can use logistic regression for predicting each class separately, but I don't know how well it works in practice (in terms of performance and accuracy). I am very new to machine learning.
I did a prototype in R, but on the real data (~10M observations), R runs out of memory. Streaming is a strength of Haskell (and HLearn), and the algorithm itself is straightforward, so I thought this would be a good fit.
Thanks for the warning though. I guess I may naively try to code a naive Bayes classifier myself and see where this leads me.
I wanted to do Bayesian classification with HLearn, but it is no longer here.
I found #59 which gives a bit of background, but it doesn't solve my problem :) So, where do things stand now? Anything we can do to resurrect NBC and other classifiers from hlearn-classification?