Closed ExpandingMan closed 3 years ago
Long story short, I think I "hid" StatLearn from the docs a while back because it's pretty easy for things to go wrong. The majorization-based algorithms are pretty safe but can give estimates biased towards zero.
Really, I just need to add more docs for the StatLearn stuff, so maybe this PR will nudge me to do it.
Cool thanks. Since I posted this I discovered a bunch more machine learning methods I was not aware of, but indeed I've had a number of problems (talking about FastForest
and FastTree
here). I may work on fixing issues I encounter, doesn't look like it'll be that hard to fix.
I'm getting pretty excited about the machine learning stuff in this package because the prospects of drastically simplifying really "big" jobs is extremely tantalizing. My colleagues tend to run jobs that spend much of their time unnecessarily allocating resources, but simply iterating over OnlineStats can probably completely eliminate about 90% of the cost of a lot of big jobs and in some cases even eliminate the need for clusters entirely. Haven't worked out a big example for myself yet, but still, very exciting.
Hello! I'm only now realizing quite how awesome this package is. Even though there is a ton of (great!) documentation, a few things are a bit buried and I almost didn't realize they existed.
Especially
StatLearn
, that seems like kind of a big deal.This adds
StatLearn
and its possible arguments to a dedicated new section of the "Stats and Models" section of the docs. I tried to make this consistent with the existing format, but please feel free to suggest changes.