statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
9.75k stars 2.84k forks source link

VAR.test_causality tests #379

Open jseabold opened 12 years ago

jseabold commented 12 years ago

It looks like VAR still has never had proper correctness tests written. There is some test data for test_causailty, but I'm not sure where they came from - they look like maybe regression / smoke tests only. The results for test_causality are slightly different than Stata's and I'd like to figure out why.

wesm commented 12 years ago

I might have gotten the test values from Luetkepohl but it's been a while

jseabold commented 12 years ago

Somewhat relatedly, here's a recipe for doing a "test all granger causality" table that could be added to VARResults

dta = sm.datasets.macrodata.load_pandas().data
endog = dta[["infl", "unemp", "tbilrate"]]
index = sm.tsa.datetools.dates_from_range('1959Q1', '2009Q3')
endog.index = pandas.Index(index) # DatetimeIndex or PeriodIndex in 0.8.0
nrows = len(endog.columns)**2

table = pandas.DataFrame(np.zeros((nrows,3)), columns=['chi2', 'df', 'prob(>chi2)'])
index = []
variables = set(endog.columns.tolist())
i = 0
for vari in variables:
    others = []
    for j,ex_vari in enumerate(variables):
        if vari == ex_vari: # don't want to test this
            continue
        others.append(ex_vari)
        res = var_mod.test_causality(vari, ex_vari, kind='Wald', verbose=False)
        table.ix[[i], ['chi2', 'df', 'prob(>chi2)']] = (res['statistic'], res['df'], res['pvalue'])
        i += 1
        index.append([vari, ex_vari])
    res = var_mod.test_causality(vari, others, kind='Wald', verbose=False)
    table.ix[[i], ['chi2', 'df', 'prob(>chi2)']] = res['statistic'], res['df'], res['pvalue']
    index.append([vari, 'ALL'])
    i += 1
table.index = pandas.MultiIndex.from_tuples(index, names=['Equation', 'Excluded'])
josef-pkt commented 12 years ago

It looks like Stata uses df of only the equations involved in the test, while VAR counts the df of the entire system.

From what I looked at before, there is also a possible difference in the treatment of multi-variable causality.

jseabold commented 12 years ago

Does it need a fix? Regardless, I like to have notes in the test suite that explain discrepancies.

We still need a lot of docs in VAR. While writing the tutorial, I'm having to look at the source constantly to figure out what options are, what variable names means, and what conventions are used.

josef-pkt commented 12 years ago

I had two threads on this IIRC, I didn't find the first, the second compares with R (var in R also counts system df)

https://groups.google.com/forum/#!searchin/pystatsmodels/df$20in$20granger$20causality/pystatsmodels/5dsJFK8EMps/krxPIhvN1zwJ

I thought the treatment for multiple variables is a bit strange, but I never looked at definitions of granger causality across multivariate (instead of bivariate) endog carefully .

josef-pkt commented 12 years ago

In the middle of this thread is the explanation by Wes and the references for df of causality https://groups.google.com/d/topic/pystatsmodels/mcXRZGmThe mCCzM/discussion

shorter, official link for the thread in previous comment https://groups.google.com/d/topic/pystatsmodels/5dsJFK8EMps/discussion