PGD optimization for a convolution kernel

Hierakonpolis commented 7 months ago

The problem I'm trying to solve is stated as such, from Cury et al.: Pasted image 20240327144556

With y being a 1d vector of targets, one for each t in T, X shaped (T, W, L) and alpha shaped (W, L). The function q is the sum of the elements of the element-wise products between X and alpha, what in numpy would be np.sum(X * alpha, (1,2)).

To vectorize the problem over T, the way I would be temped to do it if it was e.g. pytorch would be to set alpha as requiring gradient and either rely on broadcasting or convolve X with alpha as the kernel.

I am entirely new to pyxu, but to implement this my guess would be I would need SquaredL2Norm(dim=y.size).asloss(y.ravel()) * [X convolved with alpha] And optimize this (plus the phi function, which is just the sum of L21Norm and L1Norm) for alpha. I'm not quite sure how to implement this here, or if this would be supported at all. Could anyone point me in the right direction?

joanrue commented 6 months ago

Hi @Hierakonpolis ,

Thanks for reaching out, your problem can indeed be handled with Pyxu. Let me rewrite it to have a clearer mapping between your cost functional to Pyxu operators.

$$ \hat{\mathbf{\alpha}} = \arg\min{\mathbf{\alpha}\in\mathbb{R}^{W\times L}} \frac{1}{2}\Vert \mathbf{y} - Q(\mathbf{\alpha})\Vert{2}^{2} + \phi( \mathbf{\alpha})$$

Where $\phi(\mathbf{\alpha})= \Vert \mathbf{\alpha} \Vert{21} + \Vert\mathbf{\alpha} \Vert{1}$

In Pyxu you can define the first term as:

import numpy as np
from pyxu.operator import SquaredL2Norm
from pyxu.abc.operator import LinOp

T, W, L = 10, 4, 5

y = np.random.randn(T)

X_c = np.random.randn(T, W, L)
q = LinOp.from_array(A=X_c, dim_rank=2)

likelihood = SquaredL2Norm(dim_shape=T).argshift(-y) * q

For your $\phi(\mathbf{\alpha})$ function, since the sum of two proximable functions doesn't generally have a closed form solution for the proximity operator, you'll need to create your own ProxFunc and define its prox(arr, tau) method with Equation 6 in the paper you shared. You can get inspired from existing ProxFuncs. For example:

import pyxu.abc as pxa
class PhiRegularizer(pxa.ProxFunc):

    def __init__(self, dim_shape: pxt.NDArrayShape):
        pass

    @pxrt.enforce_precision(i="arr")
    def apply(self, arr: pxt.NDArray) -> pxt.NDArray:
        pass

    @pxrt.enforce_precision(i=("arr", "tau"))
    def prox(self, arr: pxt.NDArray, tau: pxt.Real) -> pxt.NDArray:
        pass

prior = PhiRegularizer(W, L)

Which then you can use in PGD for example:

solver = PGD(f=likelihood, g=prior)

alpha0 = np.random.randn(W, L)
solver(x0=alpha0)

alpha_hat = solver.solution()

I hope that helps!

Hierakonpolis commented 6 months ago

Hello @joanrue, and thank you for taking the time to respond! Using what appears to be the latest version of pyxu, '1.2.0', the .from_array method in LinOp does not appear to accept a dim_rank argument TypeError: LinOp.from_array() got an unexpected keyword argument 'dim_rank'

It also does not seem to show up in the documentation.

Simply removing said argument does not work:

AssertionError: shape: expected tuple[numbers.Integral, numbers.Integral], got (1272, 104, 10). What am I missing here?

joanrue commented 6 months ago

You're right, sorry! I forgot to tell you that we are working on an update on Pyxu (Pyxu v2). The code I shared was using this current implementation which is not yet on PyPI (hence the issue you are facing), which will work with the branches main and ndv2_dev. Pyxu v2 is still in beta but we are planning a release.

I'd recommend you trying v2. You can do so by installing Pyxu locally with the "dev" version:

git clone https://github.com/pyxu-org/pyxu.git
cd pyxu/
git checkout feature/ndv2_dev
pip install ".[dev]"

and using

tox -e doc

To build the documentation (you can then find it on pyxu/doc/build/index.html).

Please let me know if you can manage it.

Hierakonpolis commented 6 months ago

Thank you, that seemed to get it running. I hope this is not a stupid question but I am a bit confused as to what the likelihood operator defined above now would be doing. Trying to understand why I keep converging on a matrix of zeroes, I notice that if I just apply the operators manually, meaning a squared L2 norm of y - q(alpha_hat), I get of course the squared L2 norm of y when q is zero. But if I go through the likelihood object defined above, the result is instead zero.

sq_norm = SquaredL2Norm(dim_shape=T)
q_eval = q(alpha_hat)
print('q_eval:', np.unique(q_eval))
print('sq_nrom:',sq_norm(y - q_eval))
print('sq_nrom of y:',sq_norm(y))
print('likelihood', likelihood(alpha_hat))

Results in:

q_eval: [0.]
sq_nrom: [2507.71515935]
sq_nrom of y: [2507.71515935]
likelihood [0.]

What else is going on inside likelihood?

Just in case I will also paste the definition of the operators

q = LinOp.from_array(A=X_c, dim_rank=2)
likelihood = SquaredL2Norm(dim_shape=T).argshift(-y) * q

The regularizer

class PhiRegularizer(pxa.ProxFunc):

    def __init__(self, 
                 dim_shape: pxt.NDArrayShape,
                 lam: float,
                 rho: float = 1500 
                ):
        self.l1 = L1Norm(dim_shape)
        self.l21 = L21Norm(dim_shape, l2_axis=1)
        self.rho = rho
        self.lam = lam

    @pxrt.enforce_precision(i="arr")
    def apply(self, arr: pxt.NDArray) -> pxt.NDArray:
        return self.lam * self.l21(arr) + self.rho * self.l1(arr)

    @pxrt.enforce_precision(i=("arr", "tau"))
    def prox(self, arr: pxt.NDArray, tau: pxt.Real) -> pxt.NDArray:
        first = arr / (np.abs(arr) + 1e-12)
        second = np.maximum(np.abs(arr) - self.rho, 0)

        den = np.sqrt(np.sum(np.maximum(arr - self.rho, 0) ** 2, axis=1)) + 1e-12
        third = np.maximum(1 - self.lam / den, 0)

        return first * second * third.reshape(-1,1)

prior = PhiRegularizer((W, L), 100)

And the optimization

from pyxu.opt.solver import PGD
from pyxu.opt import solver
solver = solver.PGD(f=likelihood, g=prior)

alpha0 = np.random.randn(W, L)
Xn = X_c.ravel().reshape(-1, 1)
lipschitz = np.linalg.norm(Xn.T @ Xn)
solver.fit(x0=alpha0, tau = 1/lipschitz)

alpha_hat = solver.solution()

joanrue commented 6 months ago

Hi, it seems your parameters $\lambda$ and $\rho$ might be set to too large values. I've decreased them to 1e-3 and the solution becomes different than zero.

I've updated your code to also support GPU (Cupy) and distributed (Dask) arrays, and to compute the Lipschitz constant for you.

import numpy as np
import pyxu.operator as pxo
import pyxu.info.ptype as pxt
import pyxu.abc as pxa
import pyxu.runtime as pxrt
import pyxu.util as pxu
from pyxu.opt.solver import PGD
from pyxu.opt import solver

class PhiRegularizer(pxa.ProxFunc):

    def __init__(self, 
                 dim_shape: pxt.NDArrayShape,
                 lam: float = 1e-3,
                 rho: float = 1e-3,
                ):
        self.l1 = pxo.L1Norm(dim_shape)
        self.l21 = pxo.L21Norm(dim_shape, l2_axis=1)
        self.rho = rho
        self.lam = lam

    @pxrt.enforce_precision(i="arr")
    def apply(self, arr: pxt.NDArray) -> pxt.NDArray:
        return self.lam * self.l21(arr) + self.rho * self.l1(arr)

    @pxrt.enforce_precision(i=("arr", "tau"))
    def prox(self, arr: pxt.NDArray, tau: pxt.Real) -> pxt.NDArray:     
        xp = pxu.get_array_module(arr) # This line adds support for Cupy and Dask
        first = arr / (xp.abs(arr) + 1e-12)
        second = xp.maximum(xp.abs(arr) - self.rho, 0)
        den = xp.sqrt(xp.sum(xp.maximum(arr - self.rho, 0) ** 2, axis=1)) + 1e-12
        third = xp.maximum(1 - self.lam / den, 0)
        return first * second * third[..., xp.newaxis]

T, W, L = 10, 4, 5

y = np.random.randn(T)

X_c = np.random.randn(T, W, L)
q = pxa.LinOp.from_array(A=X_c, dim_rank=2)

likelihood = pxo.SquaredL2Norm(dim_shape=T).argshift(-y) * q

prior = PhiRegularizer((W, L))
solver = solver.PGD(f=likelihood, g=prior)

alpha0 = np.random.randn(W, L)

# This line estimates the step size for you (PGD will access this quantity to set `tau`)
likelihood.diff_lipschitz = likelihood.estimate_diff_lipschitz()
print(f"{likelihood.diff_lipschitz=}")

solver.fit(x0=alpha0)
alpha_hat = solver.solution()

Hierakonpolis commented 6 months ago

Oh I see, thank you for your help!

pyxu-org / pyxu

PGD optimization for a convolution kernel #61