This is a prototype compiling Bayesian networks into probabilistic program languages. For learning the Bayesian networks we use bnlearn. Currently both PyMC3 and Blog code is generated. Unfortunately the bnlearn package for Python is not well designed, so a roundabout via R is used.
Create a new conda environment with
conda env create -f environment.yml
and activate it with
conda activate leapp
delete it with
conda env remove -n leapp
Please install the package using pip install .
We need the following requirements for Python (3.8):
networkx
- for graphical output of the underlying networksgraphviz
- for graphical out of the underlying networkspandas
- for reading and writing datasklearn
- for some preprocessingrpy2
- for running R code numpy
- for some mathematical calculationsFurther we need an R (3.6.1) installation with the following packages:
bnlearn
- for learning a bayesian networkjsonlite
- for creating and writing a json description of the networkRgraphviz
- for plotting the network structure if some want to (see R-code))Try the run the example.py
file. This creates a simple PyMC3 code fragment for the cars data set.
python leapp.py csv_file
After installing the package one can learn the probabilistic programm.
from leapp import LearnPP
lp = LearnPP()
lp.fit(data)
data
has to be a Pandas DataFrame
object.LearnPP
accepts the following parameter
continuous_variables
- variables that are continuous (list) discrete_variables
- varaibles that are discrete (list)whitelist_edges
- edges that must be in the model (list of tuples)blacklist_edges
- edges that are not allowed in the model (list of tuples)score
- score for the structure search, default bic
(string)algo
- algorithm for the structure search, default hc
(string)simplify_tolerance
- tolerance to merge similar distributions (float)verbose
- see more detail (boolean)fit
accepts the following additional parameter
transform_data
- if strings in the data frame we can replace them by numberscleanup
- if there are ?
in the data frame we can remove them (data frame has to be complete)The LearnPP
object allows you to get PyMC3 and Blog code
print(lp.get_pymc_code())
print(lp.get_blog_code())
Error in [[<-.data.frame(*tmp*, var, value = numeric(0)) : replacement has 0 rows ...
The data object has an error. Maybe there is an index column or there are problematic column names.
Solution Check the data frame or look into the R code and print there the data
variable to see if there are some missmatches.
Solution We encounter this with the loglik
score. May another score can fix this.
OSError
with ROSError: cannot load library 'C:\Program Files\R\R-3.6.1\bin\x64\R.dll'
Solution Please set the path variables for your R.dll. Especially R_HOME and PATH or try adding to your code.
import os
os.environ["R_HOME"] = r"C:\Program Files\R\R-3.6.1"
os.environ["PATH"] = r"C:\Program Files\R\R-3.6.1\bin\x64" + ";" + os.environ["PATH"]
There is something wrong with you R installation. Probably some R packages are not installed. Please copy the generated R-code and try it in an R environment. It should give you a hint.