scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
283 stars 83 forks source link

Use of papermill is fragile to jupyter-client version #2104

Open matthewfeickert opened 1 year ago

matthewfeickert commented 1 year ago

As noted in https://github.com/nteract/papermill/issues/711 and elsewhere, papermill v2.4.0 is not compatible with jupyter-client v8.0.0+. A temporary fix for this is to downgrade jupyter-client to

'jupyter-client<8.0.0'

during

https://github.com/scikit-hep/pyhf/blob/3d200b2930f2ba2f23587804df5c880d4daadd9f/.github/workflows/notebooks.yml#L29-L32

but that's not a real solution.

As papermill doesn't seem to be actively maintained at the moment we should consider moving off of it and just using the smallest amount of nbclient code that we can do run things ourselves.

A very rough sketch would look something like

import nbformat
import pytest
from nbclient import NotebookClient

# Avoid hanging on with ipywidgets interact by using non-gui backend
os.environ["MPLBACKEND"] = "agg"

def get_notebook_client(notebook, commons, execution_dir=None):
    if execution_dir:
        commons["resources"]["metadata"]["path"] = execution_dir
    else:
        execution_dir = commons["resources"]["metadata"]["path"]

    test_notebook = nbformat.read(execution_dir / notebook, as_version=4)
    return NotebookClient(test_notebook, **commons)

# c.f. https://nbclient.readthedocs.io/en/latest/client.html
@pytest.fixture()
def commons():
    execution_dir = Path.cwd() / "docs" / "examples" / "notebooks"
    return {
        "timeout": 600,
        "kernel_name": "python3",
        "reset_kc": True,
        "resources": {"metadata": {"path": execution_dir}},
    }

def test_hello_world(commons):
    client = get_notebook_client("hello-world.ipynb", commons)
    assert client.execute()

def test_statisticalanalysis(commons):
    # The Binder example uses specific relative paths
    client = get_notebook_client(
        "StatisticalAnalysis.ipynb",
        commons,
        execution_dir=commons["resources"]["metadata"]["path"] / "binderexample",
    )
    assert client.execute()

but for some of the notebook tests we want to be able to set variables in the notebook

https://github.com/scikit-hep/pyhf/blob/3d200b2930f2ba2f23587804df5c880d4daadd9f/tests/test_notebooks.py#L55-L60

which I'm not sure how to do with nbclient yet.

For context, papermill does this here.

Not sure if @agoose77 has any previous experience doing this on the fly (Angus, if not don't waste time looking at this).

agoose77 commented 1 year ago

Papermill just injects a new cell containing parameter definitions: https://github.com/nteract/papermill/blob/54f6c038cdae0c70d5fb04691fa465e12aeb62cb/papermill/parameterize.py#L95

You could use a similar approach should you wish!

from nbclient import NotebookClient
from pathlib import Path
import nbformat
import nbformat.v4

def execute_notebook(path: Path, parameters, notebook_dir: Path=None):    
    if notebook_dir is None:
        notebook_dir = path.parent

    nb = nbformat.read(path, as_version=nbformat.NO_CONVERT)

    src = "\n".join([f"{k} = {v!r}" for k, v in parameters.items()])
    parameters_cell = nbformat.v4.new_code_cell(src)

    nb.cells = [cell, *nb.cells]

    client = NotebookClient(nb, timeout=600, resources={'metadata': {'path': notebook_dir}}))
    return client.execute()