pangeo-data / jupyter-earth

Jupyter meets the Earth: combining research use cases in geosciences with technical developments within the Jupyter and Pangeo ecosystems.
https://jupytearth.org
Creative Commons Zero v1.0 Universal
29 stars 6 forks source link

Can we upgrade to pymc (v4) and astrea (previously theano-pymc)? #153

Closed consideRatio closed 1 year ago

consideRatio commented 2 years ago

@facusapienza21 @abbyazari I have concluded that a reason for our image size has grown immensively (now 21GB!), is partially the pymc3 installation as it forces a downgrade of numpy and scipy. If we would use the new version of pymc called just pymc (version 4+), we won't run into that downgrade issue.

My question becomes, could we drop pymc3 entirely and go for pymc (version 4) instead? And, can we replace theano-pymc with aesara that seem to be the stand in replacement? See the notes here: https://github.com/pymc-devs/pymc/blob/main/RELEASE-NOTES.md.

abbyazari commented 2 years ago

I remember the pymc3 system we have set up is very fragile, I distinctly recall that we had an issue using virtual environments with pymc3 due it its theano requirements which the hub did not let be installed. That means that using virtual environments will likely not be feasible and this replacement will likely override any pymc3 work we are pursuing. We should test this on our current system before implementing fully.

abbyazari commented 2 years ago

I believe this summarizes part of the issue https://github.com/pangeo-data/jupyter-earth/issues/104, https://github.com/pangeo-data/jupyter-earth/issues/99

consideRatio commented 2 years ago

This needs exploration

consideRatio commented 2 years ago

The image is now built (in 20 min) and testable in a new option when spawning a server. To test against it using a more powerful server one can also visit https://hub.jupytearth.org/services/configurator/ and specify it there temporarily just before starting the server.

image

consideRatio commented 2 years ago

So far, this didn't pan out @abbyazari so its not worth testing on your end yet.

I'm trying three more builds and hoping to have a version relevant for you to test soon.

(The previous everything is working image b4aa089)

facusapienza21 commented 2 years ago

Thank you @consideRatio for pursuing this!

I am using pymc3 in the Hub and even when there is a new version, I think it will be wise to keep the original pymc3 version for now. If possible, I can make a test and run the code I have with the new pymc4 and see that there are no major changes. Would it be possible to keep pymc3 for now but have a new environment with pymc4 to test our code?

Thank you!

consideRatio commented 2 years ago

@facusapienza21 @abbyazari can you clarify how you are using pymc3 currently? Are you always activating the conda environment shared/envs/pymc3 (conda activate pymc3, or alternatively selecting the pymc3 kernel detected from that environment), or are you using the pymc3 installed as part of the image?

The key point here in my mind is that if you are not using pymc3 from the base image, where both pymc and pymc3 seems to be broken, then it doesn't really matter what is done to pymc3 in the base image - right?

When I trial this code from https://github.com/pangeo-data/jupyter-earth/issues/104#issue-1116713215, it doesn't work. But, if I either conda activate pymc3 (under shared/envs/pymc3) or update PATH to point to /home/jovyan/shared/envs/pymc3/bin then it works.

import numpy as np
import pymc3 as pm

# True parameter values
alpha, sigma = 1, 1
beta = [1, 2.5]

# Size of dataset
size = 100

# Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2

# Simulate outcome variable
Y = alpha + beta[0] * X1 + beta[1] * X2 + np.random.randn(size) * sigma

basic_model = pm.Model()

with basic_model:

    # Priors for unknown model parameters
    alpha = pm.Normal("alpha", mu=0, sigma=10)
    beta = pm.Normal("beta", mu=0, sigma=10, shape=2)
    sigma = pm.HalfNormal("sigma", sigma=1)

    # Expected value of outcome
    mu = alpha + beta[0] * X1 + beta[1] * X2

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal("Y_obs", mu=mu, sigma=sigma, observed=Y)

@facusapienza21 if you want to trial something when pymc is part of the base image instead of pymc3 in the base image, then you can use the latest version of the image as you get if starting a server with the server option in the last part of the list - as seen in an image in https://github.com/pangeo-data/jupyter-earth/issues/153#issuecomment-1255553531.

consideRatio commented 2 years ago

To clarify, I'm not in any way or form suggesting that we modify shared/envs/pymc3 which is not part of the image. But I'm considering if the base installation of pymc3, which I suspect you may not even be using, and that I suspect also is broken.

If you don't use it, I figure we could even remove it from there. I think that installation of pymc3 is malfunctioning based on testing with the example above.

abbyazari commented 2 years ago

@consideRatio we are using the pymc3 kernel that we set up a few months back as a solution to the pymc3 issues, I do not believe we ever got pymc3 to work in any other way except the kernel. I am unsure if this has something to do with the image...

consideRatio commented 2 years ago

@abbyazari okay thanks for the clarification which I read as that you are using the python kernel within the pymc3 conda environment!

Since I keep failing to get pymc to work directly as part of the base image, I'll trial removing pymc3 from the base image and see if the pymc3 conda environment keeps working as before.

abbyazari commented 2 years ago

We are using the python kernel called pymc3, more details are on the slack message (including a screenshot). Let me know when you want me to test out anything in our workflow! I can hop on the hub and see if our code still runs etc.

consideRatio commented 1 year ago

@abbyazari thanks for the clarification! Okay I'll go ahead and then cleanup pymc that is installed and has never worked in the base conda environment and docker image and validate the dedicated conda environment in shared/envs/pymc3 still work.

Given that, you are in full control of if you want to transition to pymc (v4) from pymc3 etc and it won't impact the maintenance of the base image if you do or don't.

Thank you for your help understanding the situation Abby and Facu!