scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
284 stars 84 forks source link

Minimal typing for parts of API with backend-dependent return types #1472

Open alexander-held opened 3 years ago

alexander-held commented 3 years ago

Description

I would like to propose the addition of typed return values for the parts of the pyhf API that return objects with backend-dependent types.

See the following example as a motivation:

import numpy as np
import pyhf

def f() -> np.ndarray:
    model = pyhf.simplemodels.uncorrelated_background(
        signal=[12.0, 11.0], bkg=[50.0, 52.0], bkg_uncertainty=[3.0, 7.0]
    )
    prediction = model.expected_data(model.config.suggested_init())
    return prediction

f().sort()

mypy does not know anything about the return of expected_data and assumes the type is Any. The call of .sort() therefore is no problem for mypy. mypy passes for the above, and there are no issues at runtime either. For a user who never switches pyhf backend, everything is fine.

Imagine now a user who calls pyhf.set_backend("jax") before f().sort(). mypy still has no issue with that, and the code runs fine too. Type checking at runtime could catch a problem here: the return type of f is no longer a numpy array.

A bigger issue appears with pyhf.set_backend("tensorflow"): mypy passes, but at runtime this fails. The reason is that the TF tensor does not implement .sort.

One solution is wrapping prediction via pyhf.tensorlib.to_numpy before returning it. That guarantees the return type of f to be a numpy array. While this is a solution, the user would have to remember this. If they do not, it would be desirable to automatically catch the potential issue.

How could the issue with TF possibly have been caught? I see two options:

  1. testing the above code with all pyhf backends,
  2. typing support for expected_data.

Option 1: testing all backends The first option is already possible now. The major downside in my opinion is that it can add a lot of computational overhead for potentially computationally expensive integration tests. These tests for all backends fit better into the pyhf test suite in my opinion, where they are already performed.

Option 2: typing The issue with the code above could have been caught by static analysis if the return type of expected_data was not Any. For the most general case, assume that the function would be annotated with an object return type (see: Any vs object). mypy would then catch the sort call statically, and the user could know that they forgot to wrap the return value of f with pyhf.tensorlib.to_numpy. This is computationally much cheaper and does not add overhead to tests in libraries that depend on pyhf. object of course is too general, and the tensor return type definition could be narrowed down further in the future. The most general object already would help catch the above. A possible downside of this is that people who strictly use the numpy backend would still get a mypy error even though the return type of expected_data in that case is actually a numpy array. This can be avoided via # type: ignore comments, but it seems more tricky to communicate that the return type can dynamically change depending on the backend. This however does not seem crucial to me, but rather a nice-to-have feature.

Is your feature request related to a problem? Please describe.

See above: static analysis could catch issues with without having to run computationally expensive multi-backend tests.

Describe the solution you'd like

Basic typing of the API with backend-dependent return types.

Describe alternatives you've considered

Running tests for all backends of pyhf.

Relevant Issues and Pull Requests

1284 for general typing, #948 for the accompanying PR

this came up via https://github.com/alexander-held/cabinetry/pull/228

Additional context

Add any other context or screenshots about the feature request here.

alexander-held commented 3 years ago

Here is an example illustrating how the object typing can help:

from typing import Any

import numpy as np
import pyhf

def expected_data(model: pyhf.pdf.Model) -> object:
    return model.expected_data(model.config.suggested_init())

def to_numpy(data: Any) -> np.ndarray:
    return pyhf.tensorlib.to_numpy(data)

model = pyhf.simplemodels.uncorrelated_background(
    signal=[12.0, 11.0], bkg=[50.0, 52.0], bkg_uncertainty=[3.0, 7.0]
)
pyhf.set_backend("jax")
expected_data(model).sort()  # code runs, but flagged in mypy

pyhf.set_backend("tensorflow")
expected_data(model).sort()  # code fails, and flagged in mypy
to_numpy(expected_data(model)).sort()  # no issues

This wraps expected_data to illustrate its behavior with object return type. mypy catches the .sort call in the two cases where no explicit numpy conversion took place. The first call happens to work, but this is not guaranteed for all backends. The second one fails with TF, and mypy caught this issue without having to run the code. Everything is fine after the explicit numpy conversion.

Assuming that the user forgot to do the conversion, the object type would have alerted the user about its necessity via mypy and automated tests for all backends would not have been needed.

alexander-held commented 3 years ago

Some more information from discussions with @kratsg and @phinate. Imagine a user writing a library that uses pyhf, and the user wants their library to be typed. Unless they need to keep using tensors in their library (e.g. for autodiff reasons), then typing in that library becomes significantly easier if all 0-d tensors are immediately converted to floats, and all other tensors to numpy arrays or similar. They may want to implement something like the following:

def significance(model: pyhf.pdf.Model, data: List[float]) -> float:
    return float(pyhf.infer.hypotest(0.0, data, model, test_stat="q0"))

At the moment, this function is perfectly fine. When running mypy, there is no information about what hypotest returns so the float-wrapping is no issue.

Let's imagine now that the pyhf return was typed. This function returns a tensor, so it is not obvious that the float() call works (unless the shape information is also provided). It would therefore be useful to have a pyhf.tensorlib.to_float function that converts its argument if it can, and raises an exception otherwise.

Why is it useful to have the return from pyhf typed, and provide wrapping utilities? Imagine the user implements the following function:

def get_data(model: pyhf.pdf.Model) -> List[float]:
    return model.expected_data(model.config.suggested_init()).tolist()

Without type annotations for expected_data and when using the numpy backend, everything is fine. mypy does not know that the expected_data return value does not have the tolist property for all backends, and with the numpy backend everything works out.

A tensor type annotation (or even collections.abc.Iterable) for expected_data would flag the issue: tolist will not always work. Without having to deal with runtime testing different backends at all, mypy will flag the problem and the user can solve it by wrapping the return value in pyhf.tensorlib.to_numpy, which should take a tensor and return a numpy array (or raise an exception if it cannot).

As a user, this pattern is very powerful. If my code is fully typed and I check types at runtime as well (e.g. with typeguard), I do not have to think about the different pyhf backends. If I make a call that returns a tensor and forget to wrap it in to_numpy / to_float, then I would either

This error will inform me that I forgot to wrap the tensor to get a guarantee about its type, and it crucially works without testing all backends. If my code is fully typed, runtime type checks succeed, and my code passes all tests for a single backend, then I am guaranteed that it will work for the remaining backends as well. Moving the consistency checks for backends fully to pyhf instead of having to repeat them in other libraries can save a lot of overhead.

1 This happens because the return type will presumably be not more precise than a union of the four tensor types. If the dynamic backend switching will result in the type getting narrowed down further (to e.g. a numpy array when using that backend), this pattern does no longer work.

lukasheinrich commented 3 years ago

I do think some typing would be useful, but I would like the typing to be "semantic" and backend independent..

i.e.

def logpdf(self: pyhf.Model, pars: pyhf.typing.Tensor, data: pyhf.typing.Tensor) -> pyhf.typing.Scalar:
    return pyhf.tensorlib.to_numpy(data)

would this be sufficient?

alexander-held commented 3 years ago

I think this may depend on the implementation of pyhf.typing.Scalar, but the typing does not need to be particularly precise. Making it anything other than Any (which will never cause issues downstream in mypy) should provide most/all of the benefits mentioned above I think.

matthewfeickert commented 3 years ago

Note for me in the future if I'm working on this: There is a video by Anthony Sottile on gradually applying levels of typing with mypy (no captions): gradual typing python (and my approach) (beginner - intermediate) anthony explains 308