jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.36k stars 587 forks source link

Doubt: how to work with continuous values ​​in the Bayesian network? #405

Closed willsilvano closed 6 years ago

willsilvano commented 6 years ago

Hello friends.

I am working on a project where many of the dataset information is numbers, such as decimal values, percentages, counts, etc.

Currently I transform these values ​​through discretization using the Orange API.

The result of the transformation generates ranges of values, for example: "<0", "1 - 10", "> 10" ....

My question is whether there is any better way to do this.

jmschrei commented 6 years ago

Howdy

The discretization of these values depends a lot on your data, so I can't offer a lot of help on what you should do for your specific data set. The general method is fairly widely used, though, and corresponds essentially to one-hot encodings. Something that you might want to do is consider binning your numbers using a certain number of quantiles instead of a specific range. That way each bin would correspond to a fixed number of points in your training set.

willsilvano commented 6 years ago

@jmschrei Thanks.

Is there any way to not use categories in the nodes of the Bayesian network?

jmschrei commented 6 years ago

Mathematically, one could use a linear Gaussian Bayesian network (the typical way for doing this with Bayesian networks). In pomegranate, not yet, as I haven't added continuous value support.