mikeizbicki / HLearn

Homomorphic machine learning
Other
1.63k stars 138 forks source link

Sanity checks for bad input values #10

Closed nh2 closed 9 years ago

nh2 commented 11 years ago

I just played around with the sex classification example (by the way, using HLearn I discovered that there are wrong values values on Wikipedia :/), and silently messed up the classification:

I commented out some input values and only those males with equal shoe size. This caused the variance to become 0 (since if all values are the same, there is no variance), and Normal distributions with variance 0 are invalid (they don't sum up to 1). Thus, all my male probabilities became NaN; I had not noticed had I only used classify, only my use of probabilityClassify made it visible.

Would it be possible to add some convenience sanity check functions to HLearn that check the necessary invariants, e.g. that inputs must not have variance 0?

Thanks, and great job with this library!

mikeizbicki commented 11 years ago

I've been meaning to add a function isValid for each model, but I haven't gotten around to it yet. Maybe you'd be interested in writing the code? Another possibility is to have pdf/classification/etc functions that return a Maybe type.