pyxu-org / pyxu

Modular and scalable computational imaging in Python with GPU/out-of-core computing.
https://pyxu-org.github.io/
MIT License
117 stars 17 forks source link

Towards Pycsou_V2 API Stabilization #28

Closed SepandKashani closed 2 years ago

SepandKashani commented 2 years ago

Towards Pycsou_V2 API Stabilization

The goal of this Pull Request (PR) is:

The main highlights are described below.

Core API Redesign

The goal of the pycsou.abc.operator (short-hand: pyco) module is to:

A two-level operator API was conceived to simplify the implementation of arithmetic rules w.r.t. Pycsou_V1:

After building the operator test suite however, it became apparent that the aforementioned design is not without drawbacks:

To overcome these problems, the core API has been redesigned as follows:

These changes induce a near-zero change in Pycsou's public API: existing users of the package can expect minimal changes to their code, if any. The list of exhaustive public API changes are:

Some further considerations regarding the arithmetic API:

Operator Test Suite API Redesign

pycsou_tests.operator (shorthand: pto) defines a hierarchical test suite for pyco.Operator. To test an operator, users must subclass one of the pto.conftest.[MapT, FuncT, ...]() classes and define a few fixtures with ground-truth values.

User-facing fixtures provided by pto.conftest were not designed to easily test backend/precision-specific operators. The test suite now overcomes this limitation via the spec() fixture.

Concretely users should define the following fixtures:

|----------------------------|------------------------------------------------|---------------|-------------------|
|          fixture           |                    returns                     | applicable to | auto-inferred for |
|----------------------------|------------------------------------------------|---------------|-------------------|
| spec()                     | tuple[pyct.OpT, pycd.NDArrayInfo, pycrt.Width] | Map+          |                   |
| data_shape()               | pyct.Shape                                     | Map+          |                   |
| data_apply()               | DataLike                                       | Map+          |                   |
| data_math_lipschitz()      | DataLike                                       | Map+          | LinOp+            |
| data_math_diff_lipschitz() | DataLike                                       | DiffMap+      | LinOp+            |
| data_grad()                | DataLike                                       | DiffFunc+     | LinFunc           |
| data_prox()                | DataLike                                       | ProxFunc+     | LinFunc           |
| data_adjoint()             | DataLike                                       | LinOp+        | LinFunc           |
|----------------------------|------------------------------------------------|---------------|-------------------|

Developer Tools: Annotations

Type annotations to be used throughout Pycsou are grouped in pycsou.util.ptype (short-hand: pyct). In particular it is recommended to use pyct.[Integer,Real,Shape,OpT,NDArray] wherever possible to avoid annotation inference issues, even if a more explicit type is known.

This guideline applies in particular to the pyco.Operator hierarchy: users calling the Pycsou API rarely need to know which category an operator belongs to, rather just that it abides by the Pycsou API.

Should an annotation be useful in only one module, then it is safe to define the annotation in the module.

Developer Tools: Installing Pycsou

Pycsou relies on 3rd-party libraries to work correctly. However the packages end-users need to install will depend on what they want to do with Pycsou: if interested in CPU-only compute, then there is no need to install GPU-libraries. On the other hand it is advantageous for developers to have a common environment on dev/test machines: installing all dependencies is thus encouraged here.

To accomodate a diverse audience, dependencies are split into categories: users can choose at install-time which functionality-subset to enable via python3 -m pip install pycsou[categories].

Since Pycsou_V2 is still in development, we only provide detailed installation instructions for a developer build.

Note: the installation instructions work as provided on Windows and Linux only. Developers on MacOS currently need to comment out all CUDA dependencies in ./conda/requirements.txt due to unavailability of these libraries on that platform.

Developer Tools: Dependency Management

As mentioned above, some Pycsou functionality are only available when special 3rd-party libraries/hardware are available, ex: GPU compute. To allow the pyco.Operator API and test suite to work in cases where dependencies are missing, all critical dependency management is grouped in pycsou.util.deps (short-hand: pycd). Developers are expected to access potentially-missing libraries via pycd.

The example below shows how one can leverage pycd to write a function having a different implementation on CPU/GPU. (One could alternatively use @pycu.redirect to achieve similar functionality.)

|----------------------------------------------------|-------------------------------------------------------|
|                       Before                       |                         After                         |
|----------------------------------------------------|-------------------------------------------------------|
| import dask.array as da                            | import pycsou.util.deps as pycd                       |
| import numpy as np                                 |                                                       |
| import pycsou.util as pycu                         | def eigvals(A: pyct.NDArray) -> pyct.NDArray:         |
| import pycsou.util.deps as pycd                    |     N = pycd.NDArrayInfo                              |
|                                                    |     ndi = N.from_obj(A)                               |
| def eigvals(A: pyct.NDArray) -> pyct.NDArray:      |     xpl = ndi.linalg()                                |
|     xp = pycu.get_array_module(A)                  |                                                       |
|     if xp == np:                                   |     if ndi == N.NUMPY:                                |
|         D = np.linalg.eigvals(A)                   |         D = xpl.eigvals(A)                            |
|     elif pycd.CUPY_ENABLED:                        |     elif ndi == N.CUPY:                               |
|         import cupy as cp                          |         warnings.warn("Assuming input is Hermitian.") |
|         warnings.warn("Assuming input Hermitian.") |         D = xpl.eigvalsh(A)                           |
|         D = cp.linalg.eigvalsh(A)                  |     elif ndi == N.DASK:                               |
|     elif xp == da:                                 |         raise NotImplemented                          |
|         raise NotImplemented                       |     return D                                          |
|     return D                                       |                                                       |                                                       |
|----------------------------------------------------|-------------------------------------------------------|

Performance Engineering: Read-Only Arrays

It is common for operators to perform several transforms on their input to obtain the desired output. However, the output may be an array-view, ex:

def flip1(x: pyct.NDArray) -> pyct.NDArray:
    y = x[::-1]
    return y

Subsequently modifying y = flip1(x) via an in-place operation will also update x by side-effect. Thus applying in-place updates to function outputs is potentially error-prone.

However, in-place updates are of importance for NUMPY/CUPY backends to avoid memory overhead. Thus to allow safe in-place updating of arrays obtained from functions/methods, Pycsou now provides two helper functions:

The snippet below shows how to re-write flip1() to be safe:

import pycsou.util as pycu

def flip2(x: pyct.NDArray) -> pyct.NDArray:
    y = x[::-1]
    return pycu.read_only(x)

x = np.arange(5)

y = flip1(x)
y *= 2  # side-effect: x = 2 * np.arange(5)

z = flip2(x)
z *= 2  # forbidden

t = pycu.copy_if_unsafe(flip2(x))
t *= 2  # no side-effects: x = np.arange(5)

All operators included in this PR have been updated to perform safe in-place updates.

Compatibility Notes: HALF/QUAD-Precision Dropped

Pycsou operators currently support 4 numerical precisions: HALF-, SINGLE-, DOUBLE- and QUAD-precision.

QUAD-precision is now dropped because:

HALF-precision is now dropped because:

Note however that lack of official support does not mean HALF/QUAD-precision cannot be used as input/outputs to pyco.Operator.[apply|adjoint|...](): users in need of these alternative precisions may use them after disabling automatic precision-enforcement via:

import pycsou.runtime as pycrt

with pycrt.EnforcePrecision(False):
    x = np.arange(dtype=np.float16)
    op = <some Pycsou Operator>
    y = op.apply(x)  # y should be float16 if ``op`` correctly implemented.

Miscellaneous Quality-of-Life Improvements

Known Issues

At the time of this writing, the Pycsou test suite consists of \~590_000 tests and takes \~12 hours to run in single-threaded mode. Only 0.02% of tests fail, all GPU-related.

Concretely, doing sparse LAPACK operations on GPUs are known to provide wrong results (Matrix-dependant). Thus results of LinOp.[lipschitz,svdvals,eigvals]() should be interpreted with care if the operator is GPU-only.

Mitigation strategy: obtain these values via dense GPU-compute or sparse CPU-compute if possible. Pycsou currently raises BackendWarning to inform users of potential bugs.