scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
283 stars 83 forks source link

Add stricter version guards for NumPy #2207

Closed matthewfeickert closed 1 year ago

matthewfeickert commented 1 year ago

Summary

@kskovpen noted on the IRIS-HEP Slack that they were having issues with older NumPy versions and pyhf v0.7.1:

File "<stdin>", line 1, in <module>
File "/user/kskovpen/.local/lib/python3.8/site-packages/pyhf/__init__.py", line 3, in <module>
 from pyhf.tensor.manager import get_backend
File "/user/kskovpen/.local/lib/python3.8/site-packages/pyhf/tensor/manager.py", line 49, in <module>
 _default_backend: TensorBackend = BackendRetriever.numpy_backend()
File "/user/kskovpen/.local/lib/python3.8/site-packages/pyhf/tensor/__init__.py", line 20, in __getattr__
 from pyhf.tensor.numpy_backend import numpy_backend
File "/user/kskovpen/.local/lib/python3.8/site-packages/pyhf/tensor/numpy_backend.py", line 8, in <module>
 from numpy.typing import ArrayLike, DTypeLike, NBitBase, NDArray
ModuleNotFoundError: No module named 'numpy.typing'

@agoose77 correctly noted that numpy.typing wasn't introduced until numpy v1.21.0 which is the version that pyhf tests against for our lower bounds

https://github.com/scikit-hep/pyhf/blob/22c1699f044f798f2d0fe904d336f3ce9ee50b92/tests/constraints.txt#L9

but that we don't explicitly enforce a lower bound as we defer to scipy to do that.

https://github.com/scikit-hep/pyhf/blob/22c1699f044f798f2d0fe904d336f3ce9ee50b92/pyproject.toml#L51-L53

Though the lowest constraints possible on numpy come from the lower bound on scipy which for scipy v1.5.1 is numpy>=1.14.5:

$ docker run --rm -ti python:3.8 /bin/bash
root@2cc06d3a1b63:/# python -m venv venv && . venv/bin/activate
(venv) root@2cc06d3a1b63:/# python -m pip --quiet install --upgrade pip setuptools wheel
(venv) root@2cc06d3a1b63:/# python -m pip install 'scipy==1.5.1'
Collecting scipy==1.5.1
  Downloading scipy-1.5.1-cp38-cp38-manylinux1_x86_64.whl (25.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 25.8/25.8 MB 10.9 MB/s eta 0:00:00
Collecting numpy>=1.14.5 (from scipy==1.5.1)
  Downloading numpy-1.24.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 11.2 MB/s eta 0:00:00
Installing collected packages: numpy, scipy
Successfully installed numpy-1.24.3 scipy-1.5.1
(venv) root@2cc06d3a1b63:/#

though the oldest Python 3.8 numpy with wheels is v1.17.3 and indeed this problem becomes reproducible then

...
(venv) root@2cc06d3a1b63:/# python -m pip --quiet install --upgrade 'numpy==1.17.3' 'scipy==1.5.1'
(venv) root@2cc06d3a1b63:/# python -c 'import pyhf'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/venv/lib/python3.8/site-packages/pyhf/__init__.py", line 3, in <module>
    from pyhf.tensor.manager import get_backend
  File "/venv/lib/python3.8/site-packages/pyhf/tensor/manager.py", line 49, in <module>
    _default_backend: TensorBackend = BackendRetriever.numpy_backend()
  File "/venv/lib/python3.8/site-packages/pyhf/tensor/__init__.py", line 20, in __getattr__
    from pyhf.tensor.numpy_backend import numpy_backend
  File "/venv/lib/python3.8/site-packages/pyhf/tensor/numpy_backend.py", line 8, in <module>
    from numpy.typing import ArrayLike, DTypeLike, NBitBase, NDArray
ModuleNotFoundError: No module named 'numpy.typing'
(venv) root@2cc06d3a1b63:/#

and stays that way until numpy v1.21.0 (which again is where numpy.typing got added) is used

...
(venv) root@2cc06d3a1b63:/# python -m pip --quiet install --upgrade 'numpy<=1.21.0' 'scipy==1.5.1'
(venv) root@2cc06d3a1b63:/# python -c 'import pyhf'
(venv) root@2cc06d3a1b63:/# 

So this

https://github.com/scikit-hep/pyhf/blob/22c1699f044f798f2d0fe904d336f3ce9ee50b92/src/pyhf/tensor/numpy_backend.py#L8

should be guarded against more.

@agoose77 has suggested

that import should probably be behind a if TYPE_CHECKING guard, or bump the minimum NumPy version.

OS / Environment

# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Steps to Reproduce

$ docker run --rm -ti python:3.8 /bin/bash
root@2cc06d3a1b63:/# python -m venv venv && . venv/bin/activate
(venv) root@2cc06d3a1b63:/# python -m pip --quiet install --upgrade pip setuptools wheel
(venv) root@2cc06d3a1b63:/# python -m pip --quiet install --upgrade 'numpy==1.17.3' 'scipy==1.5.1'
(venv) root@2cc06d3a1b63:/# python -c 'import pyhf'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/venv/lib/python3.8/site-packages/pyhf/__init__.py", line 3, in <module>
    from pyhf.tensor.manager import get_backend
  File "/venv/lib/python3.8/site-packages/pyhf/tensor/manager.py", line 49, in <module>
    _default_backend: TensorBackend = BackendRetriever.numpy_backend()
  File "/venv/lib/python3.8/site-packages/pyhf/tensor/__init__.py", line 20, in __getattr__
    from pyhf.tensor.numpy_backend import numpy_backend
  File "/venv/lib/python3.8/site-packages/pyhf/tensor/numpy_backend.py", line 8, in <module>
    from numpy.typing import ArrayLike, DTypeLike, NBitBase, NDArray
ModuleNotFoundError: No module named 'numpy.typing'
(venv) root@2cc06d3a1b63:/#

File Upload (optional)

No response

Expected Results

For pyhf to properly enforce lower bounds.

Actual Results

pyhf allows for installation of numpy versions before numpy.typing was introduced.

pyhf Version

pyhf, version 0.7.1

Code of Conduct

matthewfeickert commented 1 year ago

@agoose77 has suggested

that import should probably be behind a if TYPE_CHECKING guard, or bump the minimum NumPy version.

I think it is going to have to be adding a lower bound to NumPy as if the lines

from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from numpy.typing import ArrayLike, DTypeLike, NBitBase, NDArray 

are added to pyhf/tensor/numpy_backend.py then things will still fail with

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/venv/lib/python3.8/site-packages/pyhf/__init__.py", line 3, in <module>
    from pyhf.tensor.manager import get_backend
  File "/venv/lib/python3.8/site-packages/pyhf/tensor/manager.py", line 49, in <module>
    _default_backend: TensorBackend = BackendRetriever.numpy_backend()
  File "/venv/lib/python3.8/site-packages/pyhf/tensor/__init__.py", line 20, in __getattr__
    from pyhf.tensor.numpy_backend import numpy_backend
  File "/venv/lib/python3.8/site-packages/pyhf/tensor/numpy_backend.py", line 17, in <module>
    T = TypeVar("T", bound=NBitBase)
NameError: name 'NBitBase' is not defined

given

https://github.com/scikit-hep/pyhf/blob/22c1699f044f798f2d0fe904d336f3ce9ee50b92/src/pyhf/tensor/numpy_backend.py#L15

This isn't great (but also not terrible) in the grand scheme of giving control to scipy as scipy v1.5.1 was released 2020-07-04 (happy Higgspendence day :tada:) and numpy v1.21.0 was released on 2021-06-22. So at this point in 2023-05 decently old, but at the same time a year after the oldest supported SciPy version. This gets worse if we were to additionally backport this as a fix to a v0.7.x release as

https://github.com/scikit-hep/pyhf/blob/6e5feefcd802863b1dbf5517380f11afcafd3ab5/pyproject.toml#L52-L55

and scipy v1.2.0 was released in 2018.

agoose77 commented 1 year ago

You can make your TypeVar use strings (bound="NBitBase") to fix this particular problem!

matthewfeickert commented 1 year ago

You can make your TypeVar use strings (bound="NBitBase") to fix this particular problem!

Ah, yes, right again! :smile: I really need to get better at typing :sweat_smile:

matthewfeickert commented 1 year ago

@kskovpen This is now guarded against in pyhf v0.7.2 which was just released to PyPI.