cgpm

The aim of this project is to provide a unified probabilistic programming framework to express different models and techniques from statistics, machine learning and non-parametric Bayes. It serves as the primary modeling and inference runtime system for bayeslite, an open-source implementation of BayesDB.

Composable generative population models (CGPM) are a computational abstraction for probabilistic objects. They provide an interface that explicitly differentiates between the sampler of a random variable from its conditional distribution and the assessor of its conditional density. By encapsulating models as probabilistic programs that implement CGPMs, complex models can be built as compositions of sub-CGPMs, and queried in a model-independent way using the Bayesian Query Language.

Installing

Conda

The easiest way to install cgpm is to use the package on Anaconda Cloud. Please follow these instructions.

Manual Build

cgpm targets Ubuntu 14.04 and 16.04. The package can be installed by cloning this repository and following these instructions. It is highly recommended to install cgpm inside of a virtualenv which was created using the --system-site-packages flag.

Install dependencies from apt, listed here.

Retrieve and build the source.

% git clone git@github.com:probcomp/cgpm
% cd cgpm
% pip install --no-deps .

Verify the installation.

% python -c 'import cgpm'
% cd cgpm && ./check.sh

Publications

CGPMs, and their integration as a runtime system for BayesDB, are described in the following technical report:

Probabilistic Data Analysis with Probabilistic Programming. Saad, F., and Mansinghka, V. arXiv preprint, arXiv:1608.05347, 2017.

Applications of using cgpm and bayeslite for data analysis tasks can be further found in:

Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes. Saad, F. Casarsa, L., and Mansinghka, V. arXiv preprint, arXiv:1704.01087, 2017.
Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes. Saad, F., and Mansinghka, V. Artificial Intelligence and Statistics (AISTATS), 2017.
A Probabilistic Programming Approach to Probabilistic Data Analysis. Saad, F., and Mansinghka, V. Advances in Neural Information Processing Systems (NIPS), 2016.

Tests

Running ./check.sh will run a subset of the tests that are considered complete and stable. To launch the full test suite, including continuous integration tests, run py.test in the root directory. There are more tests in the tests/ directory, but those that do not start with test_ or do start with disabled_ are not considered ready. The tip of every branch merged into master must pass ./check.sh, and be consistent with the code conventions outlined in HACKING.

To run the full test suite, use ./check.sh --integration tests/. Note that the full integration test suite requires installing the C++ crosscat backend.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

probcomp / cgpm

readme