wd60622 commented 1 month ago

The model.idata.attrs is a serialized format of the model which could be used or exposed better. For instance,

Load functionality can be expose to support configuration file initialization
This can be a useful model artifact in various deployments. i.e. MLflow

wd60622 commented 1 month ago

Any thoughts on this and what is useful to save off and load back in? @ColtAllen @louismagowan

The following is already supported with the MMM:

import mlflow

with mlflow.start_run(): 
    configuration = model.idata.attrs
    with open("configuration.json", "w") as f:
        json.dump(configuration, f)
    mlflow.log_artifact("configuration.json")

Just some of the values are already stored as strings because of the netCDF format.

juanitorduz commented 1 month ago

Amazing! 🙌 I used PyMC (custom MMM) + MLFlow, and it was great to track experiments!

louismagowan commented 1 month ago

Very cool!

Couple of things that spring to mind:

We should submit a feature request to MLFlow to start supporting PyMC models natively.

Given the exploding popularity of PyMC I think it would make sense.
We could share the feature request on PyMC Discourse, MMM Hub etc and ask people to upvote it.
It would be great if we could actually save and load PyMC models into MLFlow easily - for example I could see the Registry feature being very useful when it comes to MMM "refreshes". i.e. Follow the workflow that a lot of consultancies do where you work hard to build a solid 1st MMM, and after just feed it new data to predict with <- could be useful for certain businesses without the Data personnel available to do new MMMs all the time.
In the mean time, we could perhaps write some code to leverage the mlflow.pyfunc.PythonModel to be able to save and register models in MLFlow

It might be worth adding prefixes to your params and metrics

A simple change, but MLFlow doesn't support organising your columns, metrics, params etc neatly/alphabetically
My solution is to just add meaningful prefixes so that in the UI it's easy to view

e.g. something like below, so that all sampler config settings can be viewed together in the UI


# Specify options for MCMC
SAMPLER_CONFIG = {
"draws": 1_000, 
"tune": 500, 
"chains": 6,  
"target_accept": 0.9,  
"progressbar": True,
"nuts_sampler": "numpyro", 
"random_seed": SEED,  
}

^import this constant into your notebook

Add prefixes to make MLFlow logging neater (alphabetical)

SAMPLER_CONFIG_LOGGING = { "samplerconfig" + key: val for key, val in SAMPLER_CONFIG.items() }


**There's lots of nice metrics, params and graphs that I find are useful to add**
- I have an idea for a PR on some evaluation and diagnostic metrics that I think could be logged (we do this atm)
- e.g. Something like this

# Initiate the MLflow run context
with mlflow.start_run(run_name=RUN_NAME) as run:

    # Log git hash
    git_commit_hash = get_git_revision_hash() #func to get hash of current notebook state
    if git_commit_hash:
        # Log the git commit hash as a tag
        mlflow.set_tag("git_commit_id", git_commit_hash)

    # Log the pre-processing / modelling decisions taken
    mlflow.log_params(FEATURES)
    mlflow.log_params(DIM_REDUCTION_CONFIG)
    mlflow.log_params(SEASONALITY_CONFIG)
    mlflow.log_params(TRAIN_TEST_CONFIG)
    mlflow.log_params(SAMPLER_CONFIG_LOGGING)

    # Log model metrics
    mlflow.log_metrics(model_metrics)
    mlflow.log_metrics(model_diagnostics)

    # Log whatever artifacts you want
    mlflow.log_figure(prior_plot, "graphs/prior_plot.png")
    mlflow.log_figure(adstock_alphas_plot, "graphs/adstock_alphas_plot.png")
    mlflow.log_figure(sat_lams_plot, "graphs/sat_lams_plot.png")
    mlflow.log_figure(coeffs_intercept_plot, "graphs/coeffs_intercept_plot.png")



So yeah - lots of cool things we could do here! Very interested to hear about what you had in mind for pymc-marketing X mlflow 😁 

I've got a couple PR ideas that might crossover with this stuff, so super happy to work on it with you too! (provided work doesn't get too busy haha)

pymc-labs / pymc-marketing

Expose serialized format #891

^import this constant into your notebook

Add prefixes to make MLFlow logging neater (alphabetical)