Open fonnesbeck opened 5 months ago
I like this issue, I started reading the paper, there are some interesting ideas (at least it seems so at first glance) and would be interested in contributing here.
The major obstacles for me are: i) no experience with the pymc internals, e.g pytensor backend, different API layers, etc. and linear algebra via/on computational graphs ii) do I need to re-implement the Newton iteration and CG iteration for linear systems or can I use off the shelf solvers from e.g scipy. ii) contribution may take quite a while due to quite limited time. . Is DADVI expected to be reasonably valuable for users of PyMC given the amount of supporting methods or dependencies that may need to be added to the code base?
Hey @trendelkampschroer! I have been looking over this issue some and deciding whether I would pick it up. I also have some pretty limited time, so perhaps we could work on it together async?
ii) is the big open question for me as well, will be interested to hear back on this.
@blolt I am very open to collaboration. But I first need to get some clarification from @fonnesbeck or other PyMC maintainers on the open questions above.
Hello @blolt and @trendelkampschroer. Sorry for the late response to your questions! Let me answer them in order:
Lack of experience with PyMC/PyTensor should not hold you back. We strongly encourage new contributors, and everyone has to start somewhere. It is, of course, linked with your third point but we aren't in a big hurry to get DADVI implemented, and given that my enhancement issue has been sitting here until June, I don't think you will be blocking anyone. We are grateful for you to have considered taking the time to help the project! So, certainly the first step will involve familiarization with the code base, but having a development goal in mind while learning can only accelerate the process.
Existing optimization implementations are fine -- we use scipy throughout PyMC/PyTensor (e.g. scipy.optimize.minimize
in find_MAP
, scipy.sparse
all over the place). We do have our own stochastic optimization functions, but that's a different beast.
There is no expected timeline on this, so don't let that discourage you. Any progress is acceptable progress. You also have the entire PyMC dev team to support you, especially via our project Discourse site. I would suggest submitting PRs to pymc-experimental, which is where most new inference algorithms are introduced. I would also encourage getting PRs up early in the hopes of getting feedback from the dev team earlier rather than later.
Let us know if you have additional questions!
@fonnesbeck thanks a lot for your reply and the encouraging words - it can be challenging to "break into" an existing ecosystem, especially if there is lack of experience/familiarity with the context. I very much appreciate your openess to considering and mentoring PRs from "newbies".
Being able to use existing implementations from e.g. scipy for the minimization and CG parts also drastically lowers the barrier to getting started with this.
With that out of the way it should be possible to also use https://github.com/martiningram/dadvi as a guidance, where an implementation of the method for PyMC is already available @martiningram would you be ok with us using e.g the example data in your repository, so that we can make sure that the implementation is correct/matches results from the paper. In any case I will make sure to make the paper visible through appropriate mentions/citations. If you have any suggestions or comments I'd be very glad for your guidance.
As a next step I will check out the pymc-experimental repo and see where and how a potential addition of the DADVI could happen. I will also reach out via discord in case.
Maybe that's another point on which I could use clarification. Does the community (prefer to) use discourse or github for conversations about contributions/PRs?
It would be nice if the implementation would eventually make it into pymc - if it proves useful enough to warrant addition. What is the usual process by which "stuff" eventually moves from pymc-experimental to pymc?
Thanks
Hey all,
Thanks a lot for the initiative here! Of course @trendelkampschroer , I'm very happy for you to use whatever is helpful from the code & examples. I'm hopeful that you can reuse a lot of the code in the repository -- e.g. there is already the API illustrated here: https://github.com/martiningram/dadvi/blob/main/jupyter/Radon%20example.ipynb
You could probably start with just the mean field version, i.e. what's produced by:
model_result = fit_pymc_dadvi_with_jax(m, num_fixed_draws=30)
mean_field_draws = model_result.get_posterior_draws_mean_field()
where m
is the pymc model. Maybe the code behind the functions could give you a way to get started.
I'm also happy to support where I can. Unfortunately I don't have much time to work on this at the moment, but if any questions come up, I'd be very happy to take a look :)
@trendelkampschroer there is no formal path for matriculating projects from pymc-experimental
to pymc
, but if the implementation works well enough to become a go-to VI method then we would certainly want it available in core PyMC. There is actually another VI algorithm in pymc-experimental
already (pathfinder) but it doesn't work very well yet (we have a Google Summer of Code student working on it) so its not ready to move over.
An early PR is a great place for discussion once there is code; you could also start a GitHub discussion.
Before
After
Context for the issue:
The deterministic ADVI appears to be a clear improvement on the original ADVI algorithm, both in terms of speed and accuracy. Having a look at the paper there should not be any obvious impediment to adding it to PyMC (though we may want it tied more closely to JAX than the current ADVI is).