Learning tables from known structure and data

UserAB1236872 commented 9 years ago

At the moment, does the package support learning probabilities given data? Right now I'm using gRain/gR/gRim for R, but it doesn't have support for vertices with normal distributions, which would make my life easier.

I have a mixed net with a known structure with mostly discrete nodes and a single continuous/gaussian distributed one. (Specifically, it's a map of binary features to a utility value estimate). gRain has a compile function which will take a data frame and generate the CPDs given a known graph structure (and, since this package has multiple types of node, presumably priors as well).

It doesn't look like anything of the sort exists yet in BayesNets.jl, but I wanted to make sure and ask. It looks like it's all just network score/structure learning right now? I'm not sure it's an easy addition, especially with mixed nodes. It looks like most existing packages ensure all nodes have the same distribution.

UserAB1236872 commented 9 years ago

(I suppose the better way to say this is "parameter learning")

mykelk commented 9 years ago

It doesn't support parameter learning yet, but I would like it to. It should be pretty easy to implement, even for different kinds of distributions. It already has some basic support for Gaussian distributions. Feel free to contribute a PR; otherwise, one of my students might do this during the summer.

mykelk commented 8 years ago

@tawheeler, I think this can be added pretty easily. What do you think this should look like? Maybe something like:

b = BayesNet([:B, :S, :E, :D, :C])
addEdges!(b, [(:B, :E), (:S, :E), (:E, :D), (:E, :C)])
learn!(b, d)

where d is some dataframe? I think most of the code is already there in src/factors.jl in the function estimate(). It would create discrete CPDs for each of the nodes. I think this would involve about 10 lines of new code.

Perhaps we can just support discrete values for now, but in the future, it might be nice to be able to set the types of CPDs (e.g., Gaussian) and have it learn the parameters using MAP or MLE. We would probably start by defining a learn! function for each individual type of CPD and then the version of learn! that is called on the Bayes net would just call those functions on each of the nodes. Again, this should not involve much new code.

tawheeler commented 8 years ago

Parameter estimation given data once the structure and distribution types are known is easy, and you are right that you already wrote must of the support code for discrete variables.

tawheeler commented 8 years ago

Ok, I am going to put some thought into this. I think we can leverage Distributions.jl's learn function, and that would require tighter integration. I might make a new branch so we can test everything out and get a good long-term solution.

The CPD definition vs. actual CPD question is one we should ask ourselves. I see three obvious approaches: 1 - define CPD definitions separately from actual CPDs (sort of like having Domains and CPDs in the nodes). PROS: not much changes. CONS: redundant 2 - initialize nodes with dummy CPDs that can be configured with domains and then updated during learning. PROS: non-redundant. CONS: chance of misuse of uninitialized CPDs 3 - defining a separate, un-learned BN type that only contains Domain-like definitions. PROS: non-redundant. No chance of misuse. Distributions.jl uses immutable types, so learning this way would mean you do it once and can use them directly. CONS: more code, more types.

I am currently leaning towards 2. Thoughts?

tawheeler commented 8 years ago

Learning is currently being folded into the new distributions branch. There is already some working functionality, but we aren't done yet.

tawheeler commented 8 years ago

Implemented as of merge with distributions branch, v1.0.0

sisl / BayesNets.jl

Learning tables from known structure and data #10