sryza / spark-timeseries

A library for time series analysis on Apache Spark
Apache License 2.0
1.19k stars 422 forks source link

ADF test p-values #67

Open SimonOuellette35 opened 9 years ago

SimonOuellette35 commented 9 years ago

I'm not sure I understand how the p-value of the adftest is calculated from the adf test statistic, and in fact it seems erroneous to me.

I would expect that if the adf test statistic is smaller than the critical value (around -2.8 according to the MacKinnon tables, in the 95%, constant but no trend scenario that i'm testing with), the p-value returned would be less than 0.05. That is the null hypothesis is of a unit root, and the smaller the test statistic, the more evidence we have to reject the null hypothesis (thus the smaller the p-value).

However what happens when I create an artificially mean reverting data vector and call adftest (with the "c" regression type, and 0 lag), is that adfstat = -3.73, yet p-value = 1.0. That doesn't sound right to me.

Can someone confirm?

sryza commented 9 years ago

@NablaAnalytics that definitely sounds fishy / wrong to me as well. I'll take a look into it, unless you're interested in doing so.

SimonOuellette35 commented 9 years ago

For now this is low priority for me because I just worked around it by using the test statistic directly in my application and comparing against a hard-coded critical value. Of course it's a temporary solution, and eventually we should fix this.

SimonOuellette35 commented 9 years ago

I suppose I'll leave this issue open, as a reminder that there is a pending issue.