scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
283 stars 83 forks source link

Consider using array-api-compat to unify backends over time #2253

Open matthewfeickert opened 1 year ago

matthewfeickert commented 1 year ago

Summary

In a similar vein to Issue #2249, it should be possible over time to drastically reduce the code surface area that the pyhf backends need to provide themselves if array-api-compat is used. An advantage to array-api-compat compared to keras-core is that array-api-compat has no additional dependencies and extends the possible backends to anything that implements the Array API.

I'm not fully clear on the best way to implement this, but given the usage example from the README I was thinking (this might be wrong and @asmeurer might have more ideas) to create an array backend (src/pyhf/tensor/array_backend.py (maybe this should be called array_compat instead?)) that defines the typical tensor operations. The tensor backends that are based on Array API compatible libraries (at the moment NumPy, PyTorch) could then just be implementations of the array_backend class and extend or overwrite the API as needed for things that aren't in the standard yet. This would allow for easy transitions from using a full backend to the array backend (in the case of JAX which is on the way) and also allow for use of the custom backends in the case of TensorFlow that has no official plans to switch at this point.

For the way that we currently implement things like the PyTorch backend (https://github.com/scikit-hep/pyhf/blob/ff9cb94025e5485b23ea81a06ce8916055297c7f/src/pyhf/tensor/pytorch_backend.py) and set_backend and get_backend in the the manager (https://github.com/scikit-hep/pyhf/blob/ff9cb94025e5485b23ea81a06ce8916055297c7f/src/pyhf/tensor/manager.py) I would probably need to think about this with @kratsg. I am hoping that this would be not too difficult to do.

If this works, a decent test would to also try to see how implementing a CuPy backend would work (though I don't really think we need to add it).

Additional Information

c.f. @asmeurer's SciPy 2023 talk: Python Array API Standard: Toward Array Interoperability in the Scientific Python Ecosystem

Code of Conduct

matthewfeickert commented 1 year ago

c.f. https://github.com/scipy/scipy/pull/18668 and https://github.com/scikit-learn/scikit-learn/pull/25956 for how scipy and scikit-learn added support for array-api-compat. :+1:

matthewfeickert commented 1 year ago

And here's a Qunsight Labs blog post(!) by @thomasjpfan on how the scikit-learn support was done: Array API Support in scikit-learn

matthewfeickert commented 1 year ago

And another Qunsight Labs blog post by @lucascolley on how the scipy support was done: The Array API Standard in SciPy :+1: