jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.33k stars 588 forks source link

2018 Roadmap #377

Closed jmschrei closed 5 years ago

jmschrei commented 6 years ago

Howdy person reading this.

2017 saw an unexpected growth in the number of people using the package. I'm flattered and humbled by the comments that I've gotten at the talks that I've given. Based on conversations I've had, I thought it might be beneficial to mention my two main goals for 2018.

(1) Convert from Cython over to numba

numba has recently matured to the point where it is performing nearly as well as Cython, if not better than it. I'd like to transition the backend over from Cython to numba code for three main reasons:

  1. Developer time: It takes much longer to implement features in Cython than it does in numba. If I can get similar performance and parallelism, even better!

  2. Contributor Ability: Currently, if you want to add something in pomegranate, you need to know Cython pretty well. That obviously is a high barrier to entry for people who otherwise would like to contribute important features. Switching over to numba will hopefully make it easier for contributors to help out.

  3. GPU support: Currently GPU support requires switching between Python and Cython layers. Since CuPy is basically a numpy drop in, it'd be extremely convenient for extending GPU support and adding in multi-GPU support if the code is already basically in the format it'd want to use.

(2) Add in linear Gaussian and hybrid Bayesian networks

There is a great deal of demand for these models, particularly linear Gaussian models. I haven't gotten around to it yet, but it is a very high priority for me.

It is unclear to me which I should prioritize, though. I am leaning towards adding in linear Gaussian support first, though, because the Bayesian network part of the code doesn't use much of a Cython backend yet, and so there wouldn't be much code rewriting.

Just because these are my main goals for the year doesn't mean that they will be the only things I work on, and it doesn't mean I won't be adding in other big stuff. Other major projects that I'd like to add in but haven't really gotten around to include:

  1. Dynamic Bayesian Networks
  2. Continuous Time Bayesian Networks
  3. PyMC3 backend for Bayesian versions of all these models
  4. Increased support for distributed computing / out of core data storage
  5. Hierarchical Models

We'll see what 2018 brings. Happy new years!

stonebig commented 6 years ago

hi @jmschrei. Is there any timeline to become compatible with Networkx-2.0 (or drop the dependancy) ? Being stuck with an old version is a problem.

jmschrei commented 6 years ago

It is high on my queue now. I likely won't be developing pomegranate for a month (except for critical fixes) due to a paper deadline, but I'll ping you when I add it in.