hvasbath / beat

Bayesian Earthquake Analysis Tool
GNU General Public License v3.0
128 stars 42 forks source link

PyMC3 is now named PyMC #89

Closed canyon289 closed 1 year ago

canyon289 commented 2 years ago

Hi, PyMC3 has been renamed PyMC. If this affects you and you have questions, or you want someone to direct your rage at I'm available! Do let me know how i, or any of the PyMC devs can help.

Ravin

hvasbath commented 2 years ago

Hi! Yes thanks, I am aware of that, I am an active follower ;). This repository relies on an older version of pymc3 and I will work on making the transition once aesara is polished and out of beta. Hannes

P.S.: If twitter decides its time for renaming then you cant do anything about that. ;)

canyon289 commented 2 years ago

Perfect! And yes I feel the same way about twitter. Definitely overwhelmed my prior there

michaelosthege commented 1 year ago

Hi @hvasbath let me/us know if you need help updating to PyMC v5+

I saw that you're using the (no longer available) text backend to store draws? If there's any special motivation for that I'd be interested in learning about them so I can consider them for McBackend / refactoring the backends in PyMC.

hvasbath commented 1 year ago

Hi @michaelosthege , thanks for your message! I will make the transition once pytensor pymc v5 is a little stabilized and once I find time doing so. The internals of pymc changed quite a lot and that will require quite some refactoring of BEAT internals. But I am constrained by academic life with writing proposals and papers in between the coding ...

First and for all the background to understand why storage on disc is very important: typical runtimes are several hours up to months- depending on the number of unknowns and several other parameters. Such that every generated sample counts and we do not want to loose it. The need for the Text backend was because I needed to have sth long term for disc-storage that also allowed to save not only the RVs. Later pymc added the stats field, but back in the day that did not exist. Also a plus is that the generated csv files are fast and easily human readable, which is good for debugging purposes. However, this backend is ultra slow. For production inference I implemented then an additional binary trace backend where performance for read and write was the utmost priority. The SMC sampler here generates hundreds/thousands of trace files per stage that need to be read again in the transition to the next stage. Also for HPC cluster applications it was of utmost importance to minimize interactions to disc, such that only after each "buffer_size" number of samples the samples are written to disc.

For future application I started to implement a trace that is able to deal with changing variable size, which is a requirement for trans-dimensional models, where number of RVs are being inferred jointly with RVs themselves. So that would be tremendously helpful- if one or the other backend you are developing supports that functionality. So I would be happy to dump my backends here and use the mcbackend ;) .

Hope that helps structuring development efforts! Cheers! Hannes

michaelosthege commented 1 year ago

Thanks @hvasbath for the info!

Indeed McBackend already supports RVs with dynamic shape (ragged arrays in NumPy terminology) and streaming them to ClickHouse. Long-running models and very fast inserts are exactly what it's designed for =)

hvasbath commented 1 year ago

Awesome! Sounds great! Will definitely give it a shot- once I get to it.