Towards Pycsou_V2 API Stabilization

The goal of this Pull Request (PR) is:

Refactor the core Pycsou API to be compatible with future enhancements.

Improve overall polish and performance of following modules:

pycsou.abc
pycsou.math
pycsou.operator
pycsou.opt   # partial
pycsou.runtime
pycsou.util
pycsou_tests

Provide quality-of-life improvements to the user.
Guarantee correctness of the new API via an extensive backend/precision-agnostic test suite.
Fix all implementation bugs not due to 3rd-party libraries.

The main highlights are described below.

Core API Redesign

The goal of the pycsou.abc.operator (short-hand: pyco) module is to:

Define a top-level API for users to interact with operators, along with their semantics.
Define arithmetic rules to compose operators.

A two-level operator API was conceived to simplify the implementation of arithmetic rules w.r.t. Pycsou_V1:

User-facing public API, i.e. pyco.Map and all it's children; and
Developer-oriented private API, i.e. pyco.Property.[SingleValued,Apply,Differential,...] encoding operator arithmetic rules.

After building the operator test suite however, it became apparent that the aforementioned design is not without drawbacks:

Arithmetic rules are based around pyco.Property classes which define core methods/attributes expected in the user-facing pyco.Map API. But these core methods/attributes are referenced as strings in pyco.Property._property_list() without mentioning their type, i.e. method or attribute. Implemententation of arithmetic rules therefore requires the developer to hard-code exceptions into the logic due to how these properties must be set via the Python API. This can be seen in the following extract of pyco.Property.__add__():
```
for prop in shared_props:
    if prop in ["lipschitz", "diff_lipschitz"]:  # (only known) attributes (at this point in time)
        setattr(out_op, "_" + prop, getattr(self, "_" + prop) + getattr(other, "_" + prop))
    else:  # methods
        setattr(out_op, prop, types.MethodType(ft.partial(composite_method, prop), out_op))
```
Including new attributes/methods into the arithmetic API, either for performance reasons or correctness, is non-trivial: the current Pycsou API was not built around the idea of extending arithmetic rules to more methods/attributes. This problem is detailed below for the pyco.LinOp sub-hierarchy.

The only arithmetic properties of the LinOp hierarchy are .[apply,adjoint](): all other helper methods rely on .[apply,adjoint]() to compute their values, ex:
- asarray()
- svdvals()
- eigvals() [NormalOp+]
- trace() [SquareOp+]
- [pinv,dagger]()
If operators are backend-agnostic, then the default implementations of these listed methods work after any operator arithmetic -> nothing to do.

However, if the operator is domain-specific (or one of it's methods makes it domain-specific), then the default implementations may break during arithmetic composition.
- Concrete Example of Broken Arithmetic
```
op = <some backend-agnostic DiffMap> = op_orig
J = 2 * op.jacobian(<CuPy array>)  # domain-specific: J.apply/adjoint/... only accept CuPy inputs
A = J.asarray(xp=np, dtype=<any>)  # ERROR
```
  Reason: J.asarray() calls the default implementation of LinOp.asarray() based on LinOp.apply(xp.eye(...)). But J is domain-specific: the call to J.apply() will only work if xp=cp. This violates the .asarray() signature where any value for xp should work.
  
  If .asarray() was an arithmetic method however, then we could force the computation to be done via
```
J.asarray(xp, dtype) = op_orig.asarray(xp, dtype) * 2
```
  a result which will work regardless of the value of xp since op_orig.asarray() can compute it without any problem.
- Concrete Example of Faster (Non-Default) Implementation
```
op = -2 * <some domain-agnostic UnitOp> = -2 * op_orig
D = op.svdvals(k=1, which='SM')  # works, but sub-optimal
```
  op.svdvals() is suboptimal because op is just a NormalOp, hence piggy-backs on LinOp.svdvals(), an iterative algorithm. However, we know from arithmetic rules that op.svdvals() could be computed more efficiently by piggy-backing on UnitOp.svdvals() which can be computed in closed form.
```
op.svdvals(k) = op_orig.svdvals(k) * abs(-2)
```
Bottom line:
- some methods can be computed faster by inclusion in arithmetic rules, and
- some methods can only be computed correctly by inclusion in arithmetic rules.
Arithmetic rules are scattered throughout several files. It is thus cumbersome to fix errors globally: fixes introduced in a sub-file (since thought to be only applicable to a part of the API) cannot be easily refactored and made available for re-use elsewhere.
The arithmetic API heavily relies on isinstance() instead of Property.has(). This is dangerous since users could create custom operators which abide by Pycsou's public API, but are not descendants of pyco.Map. (Note: the test suite does exactly this.)

To overcome these problems, the core API has been redesigned as follows:

pyco.Property is an enumeration where each key (ex: pyco.Property.DIFFERENTIABLE) encodes a unique subset of properties expected from an operator. These properties can be queried via .arithmetic_[methods|attributes](). pyco.Property is built with extensibility in mind: new arithmetic attributes/methods can be added down the road as the need arises.
pyco.Map now inherits from a unique parent class pyco.Operator. This class includes all public methods from the old pyco.Property hierarchy, including some helper functions to manipulate properties. Users should not need to initialize pyco.Operator explicitly: their entry point into the Pycsou operator stack is still pyco.Map and its descendants.
Arithmetic methods are defined in pyco.Operator, but their implementation is deferred to the pycsou.abc.arithmetic (short-hand: pyca) module. All arithmetic operations are hence implemented in a unique place to ease maintenance and future enhancements.

Arithmetic rules are implemented via pyca.Rule(**kwargs). These objects contain:

an implementation for each arithmetic method/attribute defined in pyco.Property.arithmetic_[methods|attributes](). These are rule-specific, ex (simplified):

class ScaleRule(Rule):
    def __init__(self, op: pyct.OpT, cst: pyct.Real):
        super().__init__()
        self._op = op
        self._cst = float(cst)

    @pycrt.enforce_precision(i="arr")
    def apply(self, arr: pyct.NDArray) -> pyct.NDArray:
        out = self._op.apply(arr) * self._cst
        return out

a public method .op() which generates an operator given inputs to __init__(), ex (simplified):

class ScaleRule(Rule):
    def op(self) -> pyct.OpT:
        klass = self._infer_op_klass()
        op = klass(shape=self._op.shape)
        op._op = self._op  # embed for introspection
        op._cst = self._cst  # embed for introspection
        for p in op.properties():
            for name in p.arithmetic_attributes():
                attr = getattr(self, name)
                setattr(op, name, attr)
            for name in p.arithmetic_methods():
                func = getattr(self.__class__, name)
                setattr(op, name, types.MethodType(func, op))
        return op

a detailed docstring with the update rules for the arithmetic methods/attributes.

These changes induce a near-zero change in Pycsou's public API: existing users of the package can expect minimal changes to their code, if any. The list of exhaustive public API changes are:

[temporary] pyco.Operator.squeeze() renamed to pyco.Operator._squeeze().
[temporary] pyco.Operator.asloss() has been removed.
pyco.Operator.specialize() replaced by pyco.Operator.asop(). Compared to .specialize(), .asop also allows specialization towards a parent class.
The damp parameter of pyco.LinOp.[pinv|dagger]() defaults to 0 in place of None.

Some further considerations regarding the arithmetic API:

Arithmetic updates preserve backend/precision-agnosticity when possible. Exceptions:
- Argshifted operators are backend-specific.
- If operator arithmetic involves backend/precision-specific operators, then the output is also backend/precision-specific.
Arithmetic rules involving linear operators are extended to all core operators. Example: UnitOp() * UnitOp() => UnitOp(). (Previously the result would be LinOp().)

Operator Test Suite API Redesign

pycsou_tests.operator (shorthand: pto) defines a hierarchical test suite for pyco.Operator. To test an operator, users must subclass one of the pto.conftest.[MapT, FuncT, ...]() classes and define a few fixtures with ground-truth values.

User-facing fixtures provided by pto.conftest were not designed to easily test backend/precision-specific operators. The test suite now overcomes this limitation via the spec() fixture.

Concretely users should define the following fixtures:

|----------------------------|------------------------------------------------|---------------|-------------------|
|          fixture           |                    returns                     | applicable to | auto-inferred for |
|----------------------------|------------------------------------------------|---------------|-------------------|
| spec()                     | tuple[pyct.OpT, pycd.NDArrayInfo, pycrt.Width] | Map+          |                   |
| data_shape()               | pyct.Shape                                     | Map+          |                   |
| data_apply()               | DataLike                                       | Map+          |                   |
| data_math_lipschitz()      | DataLike                                       | Map+          | LinOp+            |
| data_math_diff_lipschitz() | DataLike                                       | DiffMap+      | LinOp+            |
| data_grad()                | DataLike                                       | DiffFunc+     | LinFunc           |
| data_prox()                | DataLike                                       | ProxFunc+     | LinFunc           |
| data_adjoint()             | DataLike                                       | LinOp+        | LinFunc           |
|----------------------------|------------------------------------------------|---------------|-------------------|

Developer Tools: Annotations

Type annotations to be used throughout Pycsou are grouped in pycsou.util.ptype (short-hand: pyct). In particular it is recommended to use pyct.[Integer,Real,Shape,OpT,NDArray] wherever possible to avoid annotation inference issues, even if a more explicit type is known.

This guideline applies in particular to the pyco.Operator hierarchy: users calling the Pycsou API rarely need to know which category an operator belongs to, rather just that it abides by the Pycsou API.

Should an annotation be useful in only one module, then it is safe to define the annotation in the module.

Developer Tools: Installing Pycsou

Pycsou relies on 3rd-party libraries to work correctly. However the packages end-users need to install will depend on what they want to do with Pycsou: if interested in CPU-only compute, then there is no need to install GPU-libraries. On the other hand it is advantageous for developers to have a common environment on dev/test machines: installing all dependencies is thus encouraged here.

To accomodate a diverse audience, dependencies are split into categories: users can choose at install-time which functionality-subset to enable via python3 -m pip install pycsou[categories].

Since Pycsou_V2 is still in development, we only provide detailed installation instructions for a developer build.

Note: the installation instructions work as provided on Windows and Linux only. Developers on MacOS currently need to comment out all CUDA dependencies in ./conda/requirements.txt due to unavailability of these libraries on that platform.

Developer Tools: Dependency Management

As mentioned above, some Pycsou functionality are only available when special 3rd-party libraries/hardware are available, ex: GPU compute. To allow the pyco.Operator API and test suite to work in cases where dependencies are missing, all critical dependency management is grouped in pycsou.util.deps (short-hand: pycd). Developers are expected to access potentially-missing libraries via pycd.

The example below shows how one can leverage pycd to write a function having a different implementation on CPU/GPU. (One could alternatively use @pycu.redirect to achieve similar functionality.)

|----------------------------------------------------|-------------------------------------------------------|
|                       Before                       |                         After                         |
|----------------------------------------------------|-------------------------------------------------------|
| import dask.array as da                            | import pycsou.util.deps as pycd                       |
| import numpy as np                                 |                                                       |
| import pycsou.util as pycu                         | def eigvals(A: pyct.NDArray) -> pyct.NDArray:         |
| import pycsou.util.deps as pycd                    |     N = pycd.NDArrayInfo                              |
|                                                    |     ndi = N.from_obj(A)                               |
| def eigvals(A: pyct.NDArray) -> pyct.NDArray:      |     xpl = ndi.linalg()                                |
|     xp = pycu.get_array_module(A)                  |                                                       |
|     if xp == np:                                   |     if ndi == N.NUMPY:                                |
|         D = np.linalg.eigvals(A)                   |         D = xpl.eigvals(A)                            |
|     elif pycd.CUPY_ENABLED:                        |     elif ndi == N.CUPY:                               |
|         import cupy as cp                          |         warnings.warn("Assuming input is Hermitian.") |
|         warnings.warn("Assuming input Hermitian.") |         D = xpl.eigvalsh(A)                           |
|         D = cp.linalg.eigvalsh(A)                  |     elif ndi == N.DASK:                               |
|     elif xp == da:                                 |         raise NotImplemented                          |
|         raise NotImplemented                       |     return D                                          |
|     return D                                       |                                                       |                                                       |
|----------------------------------------------------|-------------------------------------------------------|

Performance Engineering: Read-Only Arrays

It is common for operators to perform several transforms on their input to obtain the desired output. However, the output may be an array-view, ex:

def flip1(x: pyct.NDArray) -> pyct.NDArray:
    y = x[::-1]
    return y

Subsequently modifying y = flip1(x) via an in-place operation will also update x by side-effect. Thus applying in-place updates to function outputs is potentially error-prone.

However, in-place updates are of importance for NUMPY/CUPY backends to avoid memory overhead. Thus to allow safe in-place updating of arrays obtained from functions/methods, Pycsou now provides two helper functions:

pycu.read_only() forces arrays to read-only mode: subsequent in-place updates are forbidden.
pycu.copy_if_unsafe() makes it safe to do in-place updates of a function's output, at no cost if the array was read/write from the start.

The snippet below shows how to re-write flip1() to be safe:

import pycsou.util as pycu

def flip2(x: pyct.NDArray) -> pyct.NDArray:
    y = x[::-1]
    return pycu.read_only(x)

x = np.arange(5)

y = flip1(x)
y *= 2  # side-effect: x = 2 * np.arange(5)

z = flip2(x)
z *= 2  # forbidden

t = pycu.copy_if_unsafe(flip2(x))
t *= 2  # no side-effects: x = np.arange(5)

All operators included in this PR have been updated to perform safe in-place updates.

Compatibility Notes: HALF/QUAD-Precision Dropped

Pycsou operators currently support 4 numerical precisions: HALF-, SINGLE-, DOUBLE- and QUAD-precision.

QUAD-precision is now dropped because:

QUAD-precision as implemented on x86 CPUs is not true QUAD-precision. (\~80 bits vs. 128)
QUAD-precision is not available on GPUs.

HALF-precision is now dropped because:

it is emulated on x86 CPUs, hence orders of magnitude slower than SINGLE/DOUBLE-precision.
too low-precision to give satisfactory outputs for many operators.

Note however that lack of official support does not mean HALF/QUAD-precision cannot be used as input/outputs to pyco.Operator.[apply|adjoint|...](): users in need of these alternative precisions may use them after disabling automatic precision-enforcement via:

import pycsou.runtime as pycrt

with pycrt.EnforcePrecision(False):
    x = np.arange(dtype=np.float16)
    op = <some Pycsou Operator>
    y = op.apply(x)  # y should be float16 if ``op`` correctly implemented.

Miscellaneous Quality-of-Life Improvements

pyco.Operator.__repr__() is overloaded to show the operator name and shape.

class Flip(pyco.SquareOp):
    def __init__(self, dim: int):
        super().__init__(shape=(dim, dim))

op = Flip(dim=3)
op  # Flip(3, 3)

pyco.Operator.expr() shows a nested graph implemented by an arithmetic expression. This representation is useful to examine intermediate return types and debug otherwise-complex nested operators.

import pycsou.abc as pyca

N = 5
op1 = pyca.LinFunc.from_array(np.arange(N))
op2 = pyca.UnitOp.from_array(np.ones((N, N)))
op = ((2 * op1) + (op2 ** 3))

>>> print(op.expr())
[add, ==> SquareOp(5, 5)
.[scale, ==> LinFunc(1, 5)
..LinFunc(1, 5),
..2.0],
.[exp, ==> UnitOp(5, 5)
..UnitOp(5, 5),
..3]]

To avoid raising generic UserWarning when needed, pycsou.util.warnings now defines specific warning classes for common issues. API developers are encouraged to use them, in particular to obtain detailed warning logs during the test suite.

Known Issues

At the time of this writing, the Pycsou test suite consists of \~590_000 tests and takes \~12 hours to run in single-threaded mode. Only 0.02% of tests fail, all GPU-related.

Concretely, doing sparse LAPACK operations on GPUs are known to provide wrong results (Matrix-dependant). Thus results of LinOp.[lipschitz,svdvals,eigvals]() should be interpreted with care if the operator is GPU-only.

Mitigation strategy: obtain these values via dense GPU-compute or sparse CPU-compute if possible. Pycsou currently raises BackendWarning to inform users of potential bugs.

pyxu-org / pyxu