jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.38k stars 590 forks source link

Bayesian network- greedy search takes too long #533

Closed karnatapu closed 5 years ago

karnatapu commented 5 years ago

I have a data set of 25k rows, 30 features and each feature have 10-15 states. When I run my program using from_samples using greedy search. It is taking more than an hour and still running. Is there any way to tune method? Or am i missing something here.

teerev commented 5 years ago

You are encountering the exponential explosion in computation time that is common to all Bayesian networks, not just this package. 30 features at 10 states per feature implies 10**30 possible combinations across the joint distribution.

jmschrei commented 5 years ago

The greedy algorithm is much faster than the exponential algorithm, but it is still slow. The upper limit for the exact algorithm is ~25-28 binary variables. I wouldn't expect the greedy algorithm to be able to quickly handle 30 variables with 10-15 states each. You might try using chow-liu, which is a greedy tree-building algorithm, if you're looking to shift the balance to faster but less accurate.