jakobrunge / tigramite

Tigramite is a python package for causal inference with a focus on time series data. The Tigramite documentation is at
https://jakobrunge.github.io/tigramite/
GNU General Public License v3.0
1.33k stars 278 forks source link

NotPSDError using GPDCtorch on smoothed data #231

Closed hd1894 closed 1 year ago

hd1894 commented 2 years ago

Hello,

first of all thank you for the great tool!

Unfortunately, I constantly get NotPSDError: Matrix not positive definite after repeatedly adding jitter up to 1.0e-04. using GPDCtorch on the data smoothed with ButterWorth filter.

Using serial GPDC doesn't produce any issues. Also, not smoothed initial data can be handlled normally by GPDCtorch .

You can reproduce the error running the code below. Commenting out for loop after # smoothing disables smoothing for testing on initial data.
Commenting out gpdc = GPDCtorch(significance='analytic') switches to serial with GPDC.

import numpy as np
import pandas as pd    
import sklearn
from scipy import signal
from tigramite import data_processing as pp
from tigramite.pcmci import PCMCI
from tigramite.independence_tests import GPDC, GPDCtorch
import statsmodels.api as sm
from statsmodels.tsa.base import datetools

def butter(vals: np.array, wn: float, order: int):
    vals = vals.astype(float)
    np.place(vals, vals==None, [np.nan])
    np.place(vals, vals==np.inf, [np.nan])
    vals = pd.Series(vals)
    vals.interpolate(inplace=True)
    vals.ffill(inplace=True)
    vals.bfill(inplace=True)
    vals = np.array(vals.tolist())
    if np.isnan(vals).any(): raise ValueError('contains nan after interpolation!')
    b, a = signal.butter(order, wn, btype='lowpass', analog=False)
    return signal.filtfilt(b, a, vals, method='gust')

mdata = sm.datasets.macrodata.load_pandas().data
mdata = mdata[['year','quarter']].astype(int)
mdata = mdata[['year','quarter']].astype(str)
quarterly = mdata["year"] + "Q" + mdata["quarter"]
quarterly = datetools.dates_from_str(quarterly)
mdata = sm.datasets.macrodata.load_pandas().data
mdata = mdata[['realgdp','realcons','realinv']]
mdata.index = pd.DatetimeIndex(quarterly)

# smoothing
for ind in mdata.columns:
    mdata[ind] = butter(mdata[ind].to_numpy(), 1/7, 3)

data = np.log(mdata).diff().dropna()
chosen_indicators = ['realgdp','realinv'] # 'realcons',
data.drop(data.columns.difference(chosen_indicators), 1, inplace=True)

dataframe = pp.DataFrame(data.to_numpy(), datatime=data.index, var_names = data.columns.to_list())

gpdc = GPDC(significance='analytic', gp_params=None)
gpdc = GPDCtorch(significance='analytic')

pcmci_gpdc = PCMCI(
    dataframe=dataframe, 
    cond_ind_test=gpdc,
    verbosity=1)

results = pcmci_gpdc.run_pcmci(tau_max=2, pc_alpha=0.1, alpha_level = 0.01)

Versions of packages:

python==3.9.5 numpy==1.21.5 scipy==1.8.0 numba==0.55.1

tigramite==5.1.0.1, b59f59a sklearn==1.0.2 gpytorch==1.7.0

jakobrunge commented 2 years ago

But that error seems to be related to Python Torch, no?

hd1894 commented 2 years ago

@jakobrunge I see. You mean Gaussian Regression or its implementation can not handle that data. Could it be some violated assumptions? Or any other reasons?

jakobrunge commented 1 year ago

You would have to look into Gaussian Regression here.