microprediction / precise

World beating online covariance and portfolio construction.
Other
232 stars 40 forks source link

schur complementary portfolio - difference between theoretical formulation and code #36

Open Marzio-USI opened 3 months ago

Marzio-USI commented 3 months ago

When inspecting the Shur complementary portfolio, I encountered some issues.

In the formulation of your presentation this can be found:

1*WJUNzwuJfEXrJmD0Qkot6g

Screenshot 2024-05-07 at 18 06 19

with $$A^{c}(\gamma)= A - \gamma B D^{-1}C$$ By following the third step of the algorithm (Augment and allocate) starting from

precise/precise/skaters/portfoliostatic/schurportfactory.py line 122

  # 3. Augment and allocate
  Ag, Dg, info = schur_augmentation(A=A, B=B, C=C, D=D, gamma=gamma) <--------HERE

precise/precise/skaters/portfoliostatic/schurportutil.py , line 13

 def schur_augmentation(A,B,C,D, gamma):
    """
       Mess with A, D to try to incorporate some off-diag info
    """
    if gamma>0.0:
        max_gamma = _maximal_gamma(A=A, B=B, C=C, D=D)
        augA, bA = pseudo_schur_complement(A=A, B=B, C=C, D=D, gamma=gamma * max_gamma) # <-------------HERE
        augD, bD = pseudo_schur_complement(A=D, B=C, C=B, D=A, gamma=gamma * max_gamma) # <---------HERE TOO

        augmentation_fail = False
        if not is_positive_def(augA):
            try:
                Ag = nearest_pos_def(augA)
            except np.linalg.LinAlgError:
                augmentation_fail=True
        else:
            Ag = augA
        if not is_positive_def(augD):
            try:
                Dg = nearest_pos_def(augD)
            except np.linalg.LinAlgError:
                augmentation_fail=True
        else:
            Dg = augD

        if augmentation_fail:
            print('Warning: augmentation failed')
            reductionA = 1.0
            reductionD = 1.0
            reductionRatioA = 1.0
            Ag = A
            Dg = D
        else:
            reductionD = np.linalg.norm(Dg)/np.linalg.norm(D)
            reductionA = np.linalg.norm(Ag)/np.linalg.norm(A)
            reductionRatioA = reductionA/reductionD
    else:
        reductionRatioA = 1.0
        reductionA = 1.0
        reductionD = 1.0
        Ag = A
        Dg = D

    info = {'reductionA': reductionA,
                'reductionD': reductionD,
                'reductionRatioA': reductionRatioA}
    return Ag, Dg, info

We arrive at this pseudo_schur_complement function where $A^{c}(\gamma)= A - \gamma B D^{-1}C$ is computed:

precise/precise/skaters/portfoliostatic/schurportutil.py , line 57

def  pseudo_schur_complement(A, B, C, D, gamma, lmbda=None, warn=False):
    """
       Augmented cov matrix for "A" inspired by the Schur complement
    """
    if lmbda is None:
        lmbda=gamma
    try:
        Ac_raw = schur_complement(A=A, B=B, C=C, D=D, gamma=gamma)  
        nA = np.shape(A)[0]
        nD = np.shape(D)[0]
        Ac = to_symmetric(Ac_raw)
        M = symmetric_step_up_matrix(n1=nA, n2=nD)
        Mt = np.transpose(M)
        BDinv = multiply_by_inverse(B, D, throw=False)
        BDinvMt = np.dot(BDinv, Mt)
        Ra = np.eye(nA) - lmbda * BDinvMt
        Ag = inverse_multiply(Ra, Ac, throw=False, warn=False)
    except np.linalg.LinAlgError:
        if warn:
            print('Pseudo-schur failed, falling back to A')
        Ag = A
    n = np.shape(A)[0]
    b = np.ones(shape=(n,1))
    return Ag, b

However after that the following operations are performed:

$$ \begin{aligned} Ag &= Ra^{-1} \cdot Ac\ &= (I - \lambda B D^{-1}M^T)^{-1} \cdot Ac \ &= (I - \lambda B D^{-1}M^T)^{-1} \cdot \text{tosim}(Ac_raw) \ &= (I - \lambda B D^{-1}M^T)^{-1} \cdot \text{tosim}(A^{c}(\gamma)) \ &= (I - \lambda B D^{-1}M^T)^{-1} \cdot \text{tosim}(A - \gamma B D^{-1}C) \ \end{aligned} $$

So if we assume that $Ac$ is already symmetric (to simplify the process):

$$ \begin{aligned} Ag &= (I - \lambda B D^{-1}M^T)^{-1} \cdot (A - \gamma B D^{-1}C) \ \end{aligned} $$

Which does not match what I expect from the presentation:

  1. "Before performing inter-group allocation we make a different modification. We multiply the precision of $A^c$ by $b_Ab_A^T$ element-wise (and similarly, multiply the precision of $D^c$ by $b_Db_D^T$)".

Meaning:

$$ A' = (A - \gamma B D^{-1}C)^{*b_A} $$

Specifically I dont understand where $M$ the symmetric step up matrix comes from and why $b_A$ is never computed.

And again when we perform the sub allocation: precise/precise/skaters/portfoliostatic/schurportfactory.py lines 132 and 138

# Sub-allocate
wA = hierarchical_schur_complementary_portfolio(cov=Ag, port=port, port_kwargs=port_kwargs,
                                               alloc=alloc, alloc_kwargs=alloc_kwargs,
                                               splitter=splitter, splitter_kwargs=splitter_kwargs,
                                               seriator=seriator, seriator_kwargs=seriator_kwargs,
                                               seriation_depth = seriation_depth-1,
                                               delta=delta, gamma=gamma)
wD = hierarchical_schur_complementary_portfolio(cov=Dg, port=port, port_kwargs=port_kwargs,
                                                alloc=alloc, alloc_kwargs=alloc_kwargs,
                                                splitter=splitter, splitter_kwargs=splitter_kwargs,
                                                seriator=seriator, seriator_kwargs=seriator_kwargs,
                                                seriation_depth=seriation_depth - 1,
                                                delta=delta, gamma=gamma)

the same augmented matrix $Ag$ used to allocate in:

precise/precise/skaters/portfoliostatic/schurportfactory.py line 122

# 3. Augment and allocate
Ag, Dg, info = schur_augmentation(A=A, B=B, C=C, D=D, gamma=gamma)
aA, aD = alloc(covs=[Ag, Dg]) #<----------HERE

is also passed to the next "iteration", while on the slides I found:

  1. The intra-group allocation pertaining to block $A$ is determined by covariance matrix $A_{/ b_A(\lambda)}^c$. In this notation the vector $bA(\lambda)=\overrightarrow{1}-\lambda B D^{-1} \overrightarrow{1}$. The generalized Schur complement is $A^c(\gamma)=A-\gamma B D^{-1} C$. The notation $A{/ b}^c$ denotes $A^c /\left(b b^T\right)$ with division performed element-wise.

$$ A'' = A^c(\gamma)/b_Ab_A^T $$

microprediction commented 3 months ago

Looking ....

On Tue, May 7, 2024 at 12:10 PM Marzio-USI @.***> wrote:

When inspecting the Shur complementary portfolio, I encountered some issues.

In the formulation of your presentation this can be found:

1.WJUNzwuJfEXrJmD0Qkot6g.png (view on web) https://github.com/microprediction/precise/assets/71531645/4f1192b3-f664-4cad-913d-fd110bc3c689 Screenshot.2024-05-07.at.18.06.19.png (view on web) https://github.com/microprediction/precise/assets/71531645/267db4ad-9939-429c-8f63-15ca39aad6d0

with $$A^{c}(\gamma)= A - \gamma B D^{-1}C$$ By following the third step of the algorithm (Augment and allocate) starting from

precise/precise/skaters/portfoliostatic/schurportfactory.py line 122

3. Augment and allocate

Ag, Dg, info = schur_augmentation(A=A, B=B, C=C, D=D, gamma=gamma) <--------HERE

precise/precise/skaters/portfoliostatic/schurportutil.py , line 13

def schur_augmentation(A,B,C,D, gamma): """ Mess with A, D to try to incorporate some off-diag info """ if gamma>0.0: max_gamma = _maximal_gamma(A=A, B=B, C=C, D=D) augA, bA = pseudo_schur_complement(A=A, B=B, C=C, D=D, gamma=gamma max_gamma) # <-------------HERE augD, bD = pseudo_schur_complement(A=D, B=C, C=B, D=A, gamma=gamma max_gamma) # <---------HERE TOO

    augmentation_fail = False
    if not is_positive_def(augA):
        try:
            Ag = nearest_pos_def(augA)
        except np.linalg.LinAlgError:
            augmentation_fail=True
    else:
        Ag = augA
    if not is_positive_def(augD):
        try:
            Dg = nearest_pos_def(augD)
        except np.linalg.LinAlgError:
            augmentation_fail=True
    else:
        Dg = augD

    if augmentation_fail:
        print('Warning: augmentation failed')
        reductionA = 1.0
        reductionD = 1.0
        reductionRatioA = 1.0
        Ag = A
        Dg = D
    else:
        reductionD = np.linalg.norm(Dg)/np.linalg.norm(D)
        reductionA = np.linalg.norm(Ag)/np.linalg.norm(A)
        reductionRatioA = reductionA/reductionD
else:
    reductionRatioA = 1.0
    reductionA = 1.0
    reductionD = 1.0
    Ag = A
    Dg = D

info = {'reductionA': reductionA,
            'reductionD': reductionD,
            'reductionRatioA': reductionRatioA}
return Ag, Dg, info

We arrive at this pseudo_schur_complement function where $A^{c}(\gamma)= A - \gamma B D^{-1}C$ is computed:

precise/precise/skaters/portfoliostatic/schurportutil.py , line 57

def pseudo_schur_complement(A, B, C, D, gamma, lmbda=None, warn=False): """ Augmented cov matrix for "A" inspired by the Schur complement """ if lmbda is None: lmbda=gamma try: Ac_raw = schur_complement(A=A, B=B, C=C, D=D, gamma=gamma) nA = np.shape(A)[0] nD = np.shape(D)[0] Ac = to_symmetric(Ac_raw) M = symmetric_step_up_matrix(n1=nA, n2=nD) Mt = np.transpose(M) BDinv = multiply_by_inverse(B, D, throw=False) BDinvMt = np.dot(BDinv, Mt) Ra = np.eye(nA) - lmbda * BDinvMt Ag = inverse_multiply(Ra, Ac, throw=False, warn=False) except np.linalg.LinAlgError: if warn: print('Pseudo-schur failed, falling back to A') Ag = A n = np.shape(A)[0] b = np.ones(shape=(n,1)) return Ag, b

However after that the following operations are performed:

$$ \begin{aligned} Ag &= Ra^{-1} \cdot Ac\ &= (I - \lambda B D^{-1}M^T)^{-1} \cdot Ac \ &= (I - \lambda B D^{-1}M^T)^{-1} \cdot \text{tosim}(Ac_raw) \ &= (I - \lambda B D^{-1}M^T)^{-1} \cdot \text{tosim}(A^{c}(\gamma)) \ &= (I - \lambda B D^{-1}M^T)^{-1} \cdot \text{tosim}(A - \gamma B D^{-1}C) \ \end{aligned} $$

So if we assume that $Ac$ is already symmetric (to simplify the process):

$$ \begin{aligned} Ag &= (I - \lambda B D^{-1}M^T)^{-1} \cdot (A - \gamma B D^{-1}C) \ \end{aligned} $$

Which does not match what I expect from the presentation:

  1. "Before performing inter-group allocation we make a different modification. We multiply the precision of $A^c$ by $b_Ab_A^T$ element-wise (and similarly, multiply the precision of $D^c$ by $b_Db_D^T$)".

Meaning:

$$ A' = (A - \gamma B D^{-1}C)^{*b_A} $$

Specifically I dont understand where $M$ the symmetric step up matrix comes from and why $b_A$ is never computed.

And again when we perform the sub allocation: precise/precise/skaters/portfoliostatic/schurportfactory.py lines 132 and 138

Sub-allocatewA = hierarchical_schur_complementary_portfolio(cov=Ag, port=port, port_kwargs=port_kwargs,

                                           alloc=alloc, alloc_kwargs=alloc_kwargs,
                                           splitter=splitter, splitter_kwargs=splitter_kwargs,
                                           seriator=seriator, seriator_kwargs=seriator_kwargs,
                                           seriation_depth = seriation_depth-1,
                                           delta=delta, gamma=gamma)wD = hierarchical_schur_complementary_portfolio(cov=Dg, port=port, port_kwargs=port_kwargs,
                                            alloc=alloc, alloc_kwargs=alloc_kwargs,
                                            splitter=splitter, splitter_kwargs=splitter_kwargs,
                                            seriator=seriator, seriator_kwargs=seriator_kwargs,
                                            seriation_depth=seriation_depth - 1,
                                            delta=delta, gamma=gamma)

the same augmented matrix $Ag$ used to allocate in:

precise/precise/skaters/portfoliostatic/schurportfactory.py line 122

3. Augment and allocateAg, Dg, info = schur_augmentation(A=A, B=B, C=C, D=D, gamma=gamma)aA, aD = alloc(covs=[Ag, Dg]) #<----------HERE

is also passed to the next "iteration", while on the slides I found:

  1. The intra-group allocation pertaining to block $A$ is determined by covariance matrix $A_{/ b_A(\lambda)}^c$. In this notation the vector $bA(\lambda)=\overrightarrow{1}-\lambda B D^{-1} \overrightarrow{1}$. The generalized Schur complement is $A^c(\gamma)=A-\gamma B D^{-1} C$. The notation $A{/ b}^c$ denotes $A^c /\left(b b^T\right)$ with division performed element-wise.

$$ A'' = A^c(\gamma)/b_Ab_A^T $$

— Reply to this email directly, view it on GitHub https://github.com/microprediction/precise/issues/36, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANWLINM526MLLE3YOU2YIDLZBD4HTAVCNFSM6AAAAABHLJ5UTOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4DGNZUGE4DGMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Marzio-USI commented 3 months ago

My mistake; in case someone is having the same doubts, this might be helpful:

$$ \Sigma = \begin{bmatrix} A & B\ C & D \end{bmatrix} $$

Where $\Sigma \in \mathbb{R}^{n \times n}$ and:

  1. $A \in \mathbb{R}^{p \times p}$
  2. $B \in \mathbb{R}^{p \times q}$
  3. $C \in \mathbb{R}^{q \times p}$
  4. $D \in \mathbb{R}^{q \times q}$
  5. $p + q = n$

The minimum variance portfolio:

$$ w \propto \Sigma^{-1} \vec{1} $$

We can view any expression of the form $\Sigma^{-1}$ in terms of the minimum variance portfolio $w(\Sigma)$ and its portfolio variance $\nu(\Sigma)$, viz:

$$ \Sigma^{-1}\vec{1} = \vec{1}^T \Sigma^{-1} \vec{1} \cdot w(\Sigma) = \dfrac{1}{\nu(\Sigma)}w(\Sigma) $$

In particular if $B = 0$ then the global minimum variance allocation is proportional to:

$$ w \propto \Sigma^{-1}\vec{1} = \begin{bmatrix} A & 0\ 0 & B\ \end{bmatrix}^{-1} \vec{1} = \begin{bmatrix} A^{-1} \vec{1}\ D^{-1} \vec{1} \end{bmatrix} = \begin{bmatrix} \dfrac{1}{\nu(A)}w(A)\ \dfrac{1}{\nu(D)}w(D)\ \end{bmatrix} $$

Schur Allocation

Instead with Schur allocation we can say that:

$$ \Sigma^{-1} = \begin{bmatrix} A & B\ C & D \end{bmatrix}^{-1} = \begin{bmatrix} (A - BD^{-1}C)^{-1} & 0 \ 0 & (D - CA^{-1}B) \end{bmatrix} \begin{bmatrix} I_p & -BD^{-1} \ -CA^{-1} & I_q \end{bmatrix} $$

Where $I_p$, $I_q$ are identity matrices $\in \mathbb{R}^{p \times p}$ and $\in \mathbb{R}^{q \times q}$ respectively.

Thus, the global minimum variance allocation is proportional to:

$$ \begin{aligned} w \propto \Sigma^{-1}\vec{1} &= \begin{bmatrix} A & B\ C & D \end{bmatrix}^{-1} \vec{1}\ &= \begin{bmatrix} (A - BD^{-1}C)^{-1} & 0 \ 0 & (D - CA^{-1}B) \end{bmatrix} \begin{bmatrix} I_p & -BD^{-1} \ -CA^{-1} & I_q \end{bmatrix} \cdot \vec{1} \ \end{aligned} $$

With $A^c = (A- BD^{-1}C)$ and $D^c = (D - CA^{-1}B)$ then:

$$ \begin{aligned} w \propto \Sigma^{-1}\vec{1} &= \begin{bmatrix} (A^c)^{-1} & 0 \ 0 & (D^c)^{-1} \end{bmatrix} \begin{bmatrix} I_p & -BD^{-1} \ -CA^{-1} & I_q \end{bmatrix} \cdot \vec{1} \ &\ &= \begin{bmatrix} (A^c)^{-1} & -(A_c)^{-1}BD^{-1} \ -(D^{c})^{-1}CA^{-1} & (D^c)^{-1} \end{bmatrix} \cdot \vec{1} \end{aligned} $$

Which can be transformed as follows:

$$ \begin{aligned} \begin{bmatrix} (A^c)^{-1} & -(A_c)^{-1}BD^{-1} \ -(D^{c})^{-1}CA^{-1} & (D^c)^{-1} \end{bmatrix} \cdot \vec{1} &= \begin{bmatrix} (A^c)^{-1} & -(A_c)^{-1}BD^{-1} \ -(D^{c})^{-1}CA^{-1} & (D^c)^{-1} \end{bmatrix} \begin{bmatrix} \mathbf{\vec{1}_p}\ \mathbf{\vec{1}_q}\ \end{bmatrix} \ &\ &= \begin{bmatrix} (A^c)^{-1} \mathbf{\vec{1}_p} - \big((A_c)^{-1}BD^{-1}\big) \mathbf{\vec{1}_q}\ (D^c)^{-1} \mathbf{\vec{1}_q} - \big((D_c)^{-1}CA^{-1}\big) \mathbf{\vec{1}_p}\ \end{bmatrix} \end{aligned} $$

We can then define a matrix $M$ such that:

$$ M \mathbf{\vec{1_p}}= \mathbf{\vec{1_q}} $$

To simplify it to:

$$ \begin{aligned} \begin{bmatrix} (A^c)^{-1} & -(A_c)^{-1}BD^{-1} \ -(D^{c})^{-1}CA^{-1} & (D^c)^{-1} \end{bmatrix} \;\; \vec{1} &= \begin{bmatrix} (A^c)^{-1} & -(A_c)^{-1}BD^{-1} \ -(D^{c})^{-1}CA^{-1} & (D^c)^{-1} \end{bmatrix} \begin{bmatrix} \mathbf{\vec{1}_p}\ \mathbf{\vec{1}_q}\ \end{bmatrix} \ &\ &= \begin{bmatrix} (A^c)^{-1} \mathbf{\vec{1}_p} - \big((A_c)^{-1}BD^{-1}\big) \mathbf{\vec{1}_q}\ (D^c)^{-1} \mathbf{\vec{1}_q} - \big((D_c)^{-1}CA^{-1}\big) \mathbf{\vec{1}_p}\ \end{bmatrix} \ &\ &= \begin{bmatrix} (A^c)^{-1} \mathbf{\vec{1}_p} - \big((A_c)^{-1}BD^{-1}\big) M^{(1)}\mathbf{\vec{1}_p} \ (D^c)^{-1} \mathbf{\vec{1}_q} - \big((D_c)^{-1}CA^{-1}\big) M^{(2)}\mathbf{\vec{1}_q}\ \end{bmatrix} \ \end{aligned} $$

So that we can group them together

$$ \begin{aligned} \begin{bmatrix} (A^c)^{-1} & -(A_c)^{-1}BD^{-1} \ -(D^{c})^{-1}CA^{-1} & (D^c)^{-1} \end{bmatrix} \cdot \vec{1} &= \begin{bmatrix} (A^c)^{-1} \mathbf{\vec{1}_p} - \big((A_c)^{-1}BD^{-1}\big) M^{(1)}\mathbf{\vec{1}_p} \ (D^c)^{-1} \mathbf{\vec{1}_q} - \big((D_c)^{-1}CA^{-1}\big) M^{(2)}\mathbf{\vec{1}_q}\ \end{bmatrix} \ &\ &= \begin{bmatrix} \bigg((A^c)^{-1} - (A^c)^{-1}BD^{-1}M^{(1)}\bigg)\mathbf{\vec{1}_p} \ \bigg((D^c)^{-1} - (D^c)^{-1}CA^{-1}M^{(2)}\bigg)\mathbf{\vec{1}_q}\ \end{bmatrix} \ &\ &= \begin{bmatrix} \bigg((A^c)^{-1}\Big( I_p - BD^{-1}M^{(1)} \Big)\bigg)\mathbf{\vec{1}_p} \ \bigg((D^c)^{-1}\Big( I_q - CA^{-1}M^{(2)} \Big)\bigg)\mathbf{\vec{1}_q}\ \end{bmatrix} \end{aligned} $$

Thus following the matrix inverse property:

$$ (A B)^{-1} = B^{-1}A^{-1} $$

We can do:

$$ \begin{aligned} w \propto \Sigma^{-1}\vec{1} &= \begin{bmatrix} A & B\ C & D \end{bmatrix}^{-1} \vec{1}\ &\ &= \begin{bmatrix} \bigg((A^c)^{-1}\Big( I_p - BD^{-1}M^{(1)} \Big)\bigg)\mathbf{\vec{1}_p} \ \bigg((D^c)^{-1}\Big( I_q - CA^{-1}M^{(2)} \Big)\bigg)\mathbf{\vec{1}_q}\ \end{bmatrix} \ &\ &= \begin{bmatrix} \dfrac{1}{\nu(Ag)}w(Ag)\ \dfrac{1}{\nu(Dg)}w(Dg)\ \end{bmatrix} \end{aligned} $$

where

$$ Ag = \bigg( (A^c)^{-1}\Big( I_p - BD^{-1}M^{(1)} \Big) \bigg)^{-1} = \Big( I_p - BD^{-1}M^{(1)} \Big)^{-1} \Big((A^c)^{-1}\Big)^{-1} = \Big( I_p - BD^{-1}M^{(1)} \Big)^{-1} A^c $$

and

$$ Dg = \bigg( (D^c)^{-1}\Big( I_q - CA^{-1}M^{(2)} \Big) \bigg)^{-1} = \Big( I_q - CA^{-1}M^{(2)} \Big)^{-1} \Big((D^c)^{-1}\Big)^{-1} = \Big( I_q - CA^{-1}M^{(2)} \Big)^{-1} D^c $$

In conclusion:

$$ \begin{aligned} w \propto \Sigma^{-1}\vec{1} &= \begin{bmatrix} A & B\ C & D \end{bmatrix}^{-1} \vec{1}\ &\ &= \begin{bmatrix} \dfrac{1}{\nu(Ag)}w(Ag)\ \dfrac{1}{\nu(Dg)}w(Dg)\ \end{bmatrix} \ &\ &= \begin{bmatrix} \dfrac{1}{\nu\Bigg(\Big( I_p - BD^{-1}M^{(1)} \Big)^{-1} A^c\Bigg)}w\Bigg(\Big( I_p - BD^{-1}M^{(1)} \Big)^{-1} A^c\Bigg)\ \dfrac{1}{\nu\Bigg(\Big( I_q - CA^{-1}M^{(2)} \Big)^{-1} D^c\Bigg)}w\Bigg(\Big( I_q - CA^{-1}M^{(2)} \Big)^{-1} D^c\Bigg)\ \end{bmatrix} \end{aligned} $$

One question that still remains is how to choose $M$

From my perspective, $M$ can be defined as follows:

$$ M^{(1)} = \begin{cases} I_p & \text{if } p = q \ \dfrac{1}{p} \mathbf{\vec{1}_q}\mathbf{\vec{1}_p^T} \end{cases} $$

$$ M^{(2)} = \begin{cases} I_q & \text{if } p = q \ \dfrac{1}{q} \mathbf{\vec{1}_p}\mathbf{\vec{1}_q^T} \end{cases} $$

Let me know if something is wrong or missing, and if there is a particular motivation for which $M$ is defined as the 'step-up matrix'.

microprediction commented 2 months ago

That's exactly right the choice of step-up matrix is a bit arbitrary here.