mikeizbicki / HLearn

Homomorphic machine learning
Other
1.62k stars 135 forks source link

getMargin for Dependent MultiNormal? #35

Closed jacobstanley closed 9 years ago

jacobstanley commented 10 years ago

Is it possible to marginalize a variable which is part of a dependent multi-normal distribution? Not sure if that is the right terminology, I'm new to machine learning.

I basically have this:

data MyData = MyData
    { _foo :: String
    , _bar :: String
    , _fooN :: Double
    , _barN :: Double
    }

type MyDist = Multivariate MyData
    '[ MultiCategorical '[String, String]
     , Dependent MultiNormal '[Double, Double]
     ]
     Double

And I get a type error when I try to do this:

-- dist :: MyDist
getMargin TH_fooN $ condition TH_foo "foo" $ condition TH_bar "bar" dist

Am I supposed to be able to do that or is there another way I can sample the distribution? Ideally I want to get a tuple with the value for fooN and barN together given the specified conditions.

mikeizbicki commented 10 years ago

Yes, this is easy to do in principle. It looks like I never actually implemented it though for some reason, and that's why the type error occurs.

I probably won't be adding this feature any time soon. Working with these types has proven too cumbersome, and so the latest version on github completely scraps this syntax. I have yet to come up with a better one :(

jacobstanley commented 10 years ago

Would this configuration be possible with the latest version on github? I'm already using master from github, I don't mind switching to a different branch if it's better equip to deal with my situation.

I don't mind which syntax I use :)

jacobstanley commented 10 years ago

If you're saying that it won't be implemented any time soon because you have yet to come up with a better syntax, then... well damn :( haha

mikeizbicki commented 10 years ago

The latest dev branch uses some features from ghc 7.8. This upgrade broke all the code for multivariate distributions, and I haven't done anything to fix it yet.

It would be possible to add this capability to the master branch on github. It would involve making the MultiNormal type (https://github.com/mikeizbicki/HLearn/blob/master/src/HLearn/Models/Distributions/Multivariate/MultiNormal.hs) implement the Marginalize class (https://github.com/mikeizbicki/HLearn/blob/master/src/HLearn/Models/Distributions/Multivariate/Internal/Marginalization.hs).

In principle this is super easy. All you do is extract the right variance term from the covariance matrix (along the diagonal) and plug it into the Normal class. In practice, though, I'm not sure how easy this will be to make the types line up right.

If you're up for a challenge and want to implement it, I'd happily merge the code :)

jacobstanley commented 10 years ago

I'll give it a shot, I suspect that if I can get the types to line up you can easily tell me if I'm screwing up the variance extraction

jacobstanley commented 10 years ago

I just want to check, do I need to implement the Marginalize class or the Marginalize' class?

mikeizbicki commented 10 years ago

Marginalize', you're right. I can definitely help you out with any questions like that.