probabilistic-numerics / probnum

Probabilistic Numerics in Python.
http://probnum.org
MIT License
434 stars 56 forks source link

Call-stack explosion in complicated/nested `LinearOperator` arithmetics #509

Open schmidtjonathan opened 3 years ago

schmidtjonathan commented 3 years ago

Problem

When performing consecutive linear algebra computations on probnum.linops.LinearOperators, in which the result is a LinearOperator, which is then further processed in LinearOperator-based arithmetics operations, the following problem arises: In the absence of pre-defined arithmetics for LinearOperators, which collapse certain kinds of operations to known types of LinearOperator, as for instance Kronecker @ Kronecker = Kronecker or Scaling[.inv()] {*, @, +, -} Scaling[.inv()] = Scaling, most products/sums of LinearOperator objects are handled by fallback operators, called ProductLinearOperator and SumLinearOperator. These collect the factors/summands and - instead of evaluating the operations immediately - aggregate them and only evaluate them at the time, a LinearOperator is applied to a numpy array.

While this is desirable in some cases, in others (as for example in long time-series filtering/smoothing problems) it leads to an exponential growth of the Python call stack, holding an enormous amount of LinearOperator objects in memory.

Example

import numpy as np
from probnum import linops

A = linops.Kronecker(np.random.rand(2,2), np.random.rand(2, 2))
B = linops.Identity(4)

for _  in range(20):
  A = A @ B @ A.T

This computation over 20 steps does not finish on my laptop within 2 minutes (after which I killed the process). The reason is the accumulation of LinearOperator objects in ProductLinearOperators, SumLinearOperators, and TransposedLinearOperators, each of which hold long lists of objects.

Solution

By implementing known arithmetical identities (as mentioned above, e.g. Kronecker @ Kronecker = Kronecker), we can avoid building large accumulations in ProductLinearOperators, SumLinearOperators, and TransposedLinearOperators. This also applies to other kinds of fallback operators.

In particular, in the example above, the product can always be written as a Kronecker structure, since A @ B @ A.T = Kronecker @ Identity @ Kronecker = Kronecker @ Kronecker = Kronecker. There is no need to build up a ProductLinearOperator, which makes the computation run within milliseconds.

schmidtjonathan commented 3 years ago

@JonathanWenger , @mmahsereci , @marvinpfoertner, @pnkraemer : FYI (as discussed)

pnkraemer commented 3 years ago

Btw in case the info matters: lines such as A = A @ B @ A.T and A = A @ B @ A.T + C make up roughly a third of Kalman filters/smoothers and probabilistic ODE solvers, so resolving the issue is the only way of being able to use linear operators in diffeq and filtsmooth (and probably randprocs.markov, which is the foundation for the other two).