Doubt: how to work with continuous values in the Bayesian network?

jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.

http://pomegranate.readthedocs.org/en/latest/

MIT License

3.36k stars 587 forks source link

Doubt: how to work with continuous values in the Bayesian network? #405

Closed willsilvano closed 6 years ago

willsilvano commented 6 years ago

Hello friends.

I am working on a project where many of the dataset information is numbers, such as decimal values, percentages, counts, etc.

Currently I transform these values through discretization using the Orange API.

The result of the transformation generates ranges of values, for example: "<0", "1 - 10", "> 10" ....

My question is whether there is any better way to do this.

jmschrei commented 6 years ago

Howdy

The discretization of these values depends a lot on your data, so I can't offer a lot of help on what you should do for your specific data set. The general method is fairly widely used, though, and corresponds essentially to one-hot encodings. Something that you might want to do is consider binning your numbers using a certain number of quantiles instead of a specific range. That way each bin would correspond to a fixed number of points in your training set.

willsilvano commented 6 years ago

@jmschrei Thanks.

Is there any way to not use categories in the nodes of the Bayesian network?

jmschrei commented 6 years ago

Mathematically, one could use a linear Gaussian Bayesian network (the typical way for doing this with Bayesian networks). In pomegranate, not yet, as I haven't added continuous value support.

jmschrei / pomegranate

Doubt: how to work with continuous values ​​in the Bayesian network? #405

Doubt: how to work with continuous values in the Bayesian network? #405