pymc-devs / pymc-examples

Examples of PyMC models, including a library of Jupyter notebooks.
https://www.pymc.io/projects/examples/en/latest/
MIT License
280 stars 243 forks source link

GP Mauna Loa #73

Open OriolAbril opened 3 years ago

OriolAbril commented 3 years ago

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/gaussian_processes/GP-MaunaLoa.ipynb Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this notebook to a "Best Practices" state. However, these are probably not enough! Make sure to thoroughly review the notebook and search for other updates.

General updates

*

ArviZ related

*

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

*

ArviZ related

*

Notes

Exotic dependencies

Computing requirements

danhphan commented 2 years ago

Hi @OriolAbril, I can work on this task.

Just want to confirm that the main tasks are upgrading the notebook into PyMC v4 and new ArviZ version right? Thanks

OriolAbril commented 2 years ago

Thanks!

Yes, the main tasks are updating to use v4 and ArviZ and to update the metadata and style, see https://github.com/pymc-devs/pymc/issues/5460 for example. Everything should be written in the jupyter style guide and the webinar from the PyMC-Data Umbrella series, if you have any questions let me know

danhphan commented 2 years ago

I re-run this notebook with the latest pymc version 4.0.0b6.

There seems a problem in pm.find_MAP.

with pm.Model() as model:
    .........................
    # The Gaussian process is a sum of these three components
    gp = gp_seasonal + gp_medium + gp_trend
    # Since the normal noise model and the GP are conjugates, we use `Marginal` with the `.marginal_likelihood` method
    y_ = gp.marginal_likelihood("y", X=t, y=y, noise=cov_noise)
    # this line calls an optimizer to find the MAP
    mp = pm.find_MAP(include_transformed=True)

The results of estimated parameters from pymc v4 are:

['period:1.0',
 'α:2.5',
 'η_med:0.10000000000000002',
 'η_noise:0.05000000000000001',
 'η_per:1.0',
 'η_trend:2.0',
 'σ:0.05000000000000001',
 'ℓ_med:2.6666666666666665',
 'ℓ_noise:0.5',
 'ℓ_pdecay:133.33333333333337',
 'ℓ_psmooth :1.3333333333333333',
 'ℓ_trend:40.0']

While the expected (in pymc 3.9.3) should be:

['period:1.0003904457696988',
 'α:1.1826444920780308',
 'η_med:0.02432146950697816',
 'η_noise:0.007821354118990929',
 'η_per:0.09941238570897516',
 'η_trend:1.8391949691730372',
 'σ:0.0067995598981081',
 'ℓ_med:1.4999510109600107',
 'ℓ_noise:0.17053127928697115',
 'ℓ_pdecay:125.90347217600561',
 'ℓ_psmooth :0.7309756296087601',
 'ℓ_trend:53.345284025570486']

Model fitting: bokeh_plot (1)

The wrong estimated params from find_MAP lead to a big standard deviation in model prediction

bokeh_plot

My local PyMC v4 environment config:

Python version       : 3.9.10
IPython version      : 8.2.0
pymc  : 4.0.0b6
aesara: 2.5.1
arviz : 0.12.0
pandas: 1.4.2
numpy : 1.22.3
Watermark: 2.3.0
danhphan commented 2 years ago

There seems a similar issue at https://github.com/pymc-devs/pymc/issues/5443#issue-1122497334

ricardoV94 commented 2 years ago

What happens without that include_transformed argument? And how are you printing the parameter values, I don't see any transformed variable names in the output of MAP that you printed. There would be a bunch of _log__ suffixes

danhphan commented 2 years ago

The same [wrong] results when set include_transformed=False or remove it in pm.find_MAP()

with pm.Model() as model:
    # ........ removed .........
    # The Gaussian process is a sum of these three components
    gp = gp_seasonal + gp_medium + gp_trend
    # Since the normal noise model and the GP are conjugates, we use `Marginal` with the `.marginal_likelihood` method
    y_ = gp.marginal_likelihood("y", X=t, y=y, noise=cov_noise)
    # this line calls an optimizer to find the MAP
    mp = pm.find_MAP(include_transformed=False)

This code prints the variables:

# display the results, dont show transformed parameter values
sorted([name + ":" + str(mp[name]) for name in mp.keys() if not name.endswith("_")])

Outputs:

['period:1.0',
 'α:2.5',
 'η_med:0.10000000000000002',
 'η_noise:0.05000000000000001',
 'η_per:1.0',
 'η_trend:2.0',
 'σ:0.05000000000000001',
 'ℓ_med:2.6666666666666665',
 'ℓ_noise:0.5',
 'ℓ_pdecay:133.33333333333337',
 'ℓ_psmooth :1.3333333333333333',
 'ℓ_trend:40.0']

May be you want to see all variables, so I print it all here:

# display the results, dont show transformed parameter values
sorted([name + ":" + str(mp[name]) for name in mp.keys()])

Outputs:

['period:1.0',
 'α:2.5',
 'α_log__:0.9162907318741551',
 'η_med:0.10000000000000002',
 'η_med_log__:-2.3025850929940455',
 'η_noise:0.05000000000000001',
 'η_noise_log__:-2.995732273553991',
 'η_per:1.0',
 'η_per_log__:0.0',
 'η_trend:2.0',
 'η_trend_log__:0.6931471805599453',
 'σ:0.05000000000000001',
 'σ_log__:-2.995732273553991',
 'ℓ_med:2.6666666666666665',
 'ℓ_med_log__:0.9808292530117262',
 'ℓ_noise:0.5',
 'ℓ_noise_log__:-0.6931471805599453',
 'ℓ_pdecay:133.33333333333337',
 'ℓ_pdecay_log__:4.892852258439873',
 'ℓ_psmooth :1.3333333333333333',
 'ℓ_psmooth _log__:0.28768207245178085',
 'ℓ_trend:40.0',
 'ℓ_trend_log__:3.6888794541139363']
danhphan commented 2 years ago

Hi @OriolAbril, I have also re-run this notebook in pymc3 v3.11.4, and it works.

Is it okie if I do some General updates and ArviZ related things and make a PR to move this one to Best practices (v3) tag first.

OriolAbril commented 2 years ago

As long as notebooks move progressively towards the Done column all PRs will be welcome 😄

quantheory commented 2 years ago

The problem with using this notebook in v4 is possibly due to the buggy find_MAP behavior mentioned in pymc-devs/pymc#5923.