maximenc / pycop

Python library for multivariate dependence modeling with Copulas
https://pypi.org/project/pycop/
MIT License
93 stars 19 forks source link

Gaussian copula estimation #4

Closed ioannisrpt closed 1 year ago

ioannisrpt commented 1 year ago

Hello

Thank you for creating the pycop package. It is very useful to my research.

However, when I was using it, I noticed some problems with the estimation of the Gaussian copula:

1) The self.bounds_param = [(-1,1)] and self.parameters_start arguments seem to be missing from the initialization of the Gaussian copula class and as such its estimation fails when I'm trying for example to fit it in the MSCI return data. The following example replicates the issue:

import pandas as pd
import numpy as np
from pycop import gaussian, estimation

df = pd.read_csv("https://raw.githubusercontent.com/maximenc/pycop/master/data/msci.csv")
df.index = pd.to_datetime(df["Date"], format="%m/%d/%Y")
df = df.drop(["Date"], axis=1)

for col in df.columns.values:
    df[col] = np.log(df[col]) - np.log(df[col].shift(1))

data = df[["US","UK"]].T.values

cop = gaussian()
param, mle = estimation.fit_cmle(cop, data)

2) After I correct for problem (1) by adding the self.bounds_param and self.parameters_start, I'm always getting the np.linalg.LinAlgError('singular matrix') coming from multivariate_normal.pdf((y1,y2), mean=None, cov=[[1,rho[0]],[rho[0],1]])/(norm.pdf(y1)*norm.pdf(y2)) when I try to fit a mixture of copulas including the Gaussian. The following example replicates the error.

import pandas as pd
import numpy as np
from pycop import mixture, estimation

df = pd.read_csv("https://raw.githubusercontent.com/maximenc/pycop/master/data/msci.csv")
df.index = pd.to_datetime(df["Date"], format="%m/%d/%Y")
df = df.drop(["Date"], axis=1)

for col in df.columns.values:
    df[col] = np.log(df[col]) - np.log(df[col].shift(1))

data = df[["US","UK"]].T.values

cop = mixture(["clayton", "gaussian", "rclayton"])
param, mle = estimation.fit_cmle_mixt(cop, data)

Any help would be appreciated!

Thanks, Ioannis

tokamaster commented 1 year ago

Regarding issue 2, I believe it is coming from the scipy.stats.multivariate_normal function, rather than from pycop. I think you should be able to fix it by passing allow_singular=True to multivariate_normal.

ioannisrpt commented 1 year ago

@tokamaster You are right. When I pass the argument allow_singular=True in the multivariate_normal.cdf and multivariate_normal.pdf of pycop.bivariate.gaussian, I don't get the Singular Matrix error. However, now I get the message Singular matrix E in LSQ subproblem from scipy.stats.minimize which breaks something in pycop.estimation.fit_cmle_mixt and returns the TypeError: cannot unpack non-iterable NoneType object. Is there something wrong with the formulation of the minimization problem? Or even better, why do I get a singular matrix in the first place?

tokamaster commented 1 year ago

@ioannisrpt The mean argument of multivariate_normal must be an array, not None. For more information about your covariance matrix being singular: https://github.com/scipy/scipy/issues/5286

However, it is difficult to say where the origin of your problem is.

ioannisrpt commented 1 year ago

@tokamaster I was under the impression that multivariate_normal had a way of setting mean=None to a default value. Thanks for the link to the singular covariance matrix issue. I think I will re-define the gaussian class without multivariate_normal and then take it from there.

ioannisrpt commented 1 year ago

I fixed the estimation of the Gaussian copula (standalone or in a mixture of copulas). The issue lies in the scipy.stats.multivariate_normal module for reasons I cannot explain. I looked at its source code and I find no problem with how it operates. I use the Gaussian copula density formula directly in the gaussian class in the spirit of other copulas such as archimedean copulas and I find no problems with its estimation. I will create a pull request with the (minimal) changes that I made, if that is ok.