sryza / spark-timeseries

A library for time series analysis on Apache Spark
Apache License 2.0
1.19k stars 427 forks source link

Regression with Auto Regressive Residuals #97

Open mbaddar2 opened 8 years ago

mbaddar2 commented 8 years ago

Based on https://www.otexts.org/fpp/9/1 in this issue we will implement the model

Yt = A+Bi*Xi,t +nt where nt (residuals) are assumed to be auto regressive process of a given order q AR(q) the steps are 1-estimate OLS regression model for given regressors Xt 2-Estimate parameters for AR(q) model , then update model coefficients in 1 3-Iterate between 1 and 2 till convergence.

@sryza comments ?

sryza commented 8 years ago

@mbaddar2 sorry for the delay here, but this looks like a good strategy to me.

mbaddar1 commented 8 years ago

A note about durbin watson test for implementing Cochrane Orchutt The current implementation com.cloudera.sparkts.stats.TimeSeriesStatisticalTests#dwtest , just returns the value of the statistic without computing the critical values d_L_alpha and d_U_alpha , as mentioned in

https://en.wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic

to calculate the critical value we have two options

1)Precomputed values table , can be taken from https://www3.nd.edu/~wevans1/econ30331/Durbin_Watson_tables.pdf 2)Compute the p-value for dw-test in algorithmic way as mention in the dwtest function in R lmtest package http://www.inside-r.org/packages/cran/lmtest/docs/dwtest in details section

it think an alternative implementation for dwtest will need another issue to keep things simple , i will use the current implementation with the heuristic dw -> 0 , +ve correlation dw ->4 , -ve correlation dw -> 2, no correlation

@sryza , comments ?

sryza commented 8 years ago

I agree that it would definitely be useful to report p values for Durbin Watson.

Regarding the options, my preference would be to compute the dwtest in the algorithm way, given that the tables of precomputed values are huge. If this is too difficult though, I could be open to using the tables.

mbaddar1 commented 8 years ago

@sryza check #117

mbaddar1 commented 8 years ago

@sryza I will be working now on extending the implemented cochrane orcutt #117 to the ARMA(p,q) case. I will start trying the method outlined in brockwell book (http://www.springer.com/gb/book/9780387953519) ,Ch 6 section 6.

Note that i will start with the case where p , q are known. We can start with this case then we can automate p , q estimation. Any further suggestion ?

sryza commented 8 years ago

Awesome. How does the method outlined in Brockwell compare with the current ARIMA implementation in https://github.com/sryza/spark-timeseries/blob/master/src/main/scala/com/cloudera/sparkts/models/ARIMA.scala? I'm not super familiar with the best way to fit ARIMA + regression models. I'm guessing it's not as easy as just training the the regression and ARIMA parts separately? Does MLE with conjugate gradient descent or BOBYQA work?

mbaddar1 commented 8 years ago

I will do more reading in both methods to get better understanding , the will detail the differences to discuss.