jakobrunge / tigramite

Tigramite is a python package for causal inference with a focus on time series data. The Tigramite documentation is at
https://jakobrunge.github.io/tigramite/
GNU General Public License v3.0
1.35k stars 279 forks source link

pcmci.get_lagged_dependencies with regressionCI data_type not passed #432

Closed aglownia closed 6 days ago

aglownia commented 1 week ago

Hi,

I am experimenting with multiple datasets (M=100) each equal in shape (N = 14, T = 300) and with both continues and discrete variables in scope (passed as data_type dict to dataframe object). I am using RegressionCI(significance='analytic') as independence test parameter. While calling pcmci.get_lagged_dependencies(tau_max=20, val_only=True)['val_matrix'] I am facing error _RegressionCI.get_dependence_measure() missing 1 required positional argument: 'datatype'

Data_type matrix was passed directly in dataframe so I am not sure what issue is, as no such problems occurs while running e.g. pcmci.run_pcmciplus(tau_max = tau_max, pc_alpha = 0.01).

m in range(0,100) data_dict[m].shape = (300, 14) data_type_dic[m].shape = (300,14)

dataframe = pp.DataFrame(data=data_dict, 
                         data_type=data_type_dict,
                    analysis_mode = 'multiple',
                    var_names=variables)
ind_test = RegressionCI(significance='analytic')
pcmci = PCMCI(
    dataframe=dataframe, 
    cond_ind_test=ind_test,
    verbosity=1)
correlations = pcmci.get_lagged_dependencies(tau_max=20, val_only=True)['val_matrix']

Do you have any suggestions ?

jakobrunge commented 6 days ago

This was a bug and it's fixed now. However, the test statistic value that you get with val_only=True depends on the combination of X and Y in the dependence measure and will be a deviance of different kinds of regression models (linear / logistic). See the code for more details:

    To test :math:`X \perp Y | Z`, the regressions Y|XZ vs Y|Z, or, depending
    on certain criteria, X|YZ vs X|Z are compared. For that, the notion of
    the deviance is employed. If the fits of the respective regressions do
    not differ significantly (measured using the deviance), the null
    hypotheses of conditional independence is "accepted". This approach
    assumes that X and Y are univariate, and Z can be either empty,
    univariate or multivariate. Moreover, this approach works for all
    combinations of "discrete" and "continuous" X, Y and respective columns
    of Z; depending on the case, linear regression or multinomial regression
    is employed.