sryza / spark-timeseries

A library for time series analysis on Apache Spark
Apache License 2.0
1.19k stars 427 forks source link

Seasoned ARIMA model for time series analysis #15

Open cjnolet opened 9 years ago

cjnolet commented 9 years ago

Would be massively useful to have a Seasoned ARIMA model for the time series analysis.

sryza commented 9 years ago

I agree, this would be a great addition. Any chance this is something you'd be interested in contributing @cjnolet ?

cjnolet commented 9 years ago

Honestly, I wouldn't mind helping out and my company wouldn't have a problem letting me spend a few hours on it. I was talking to one of the Spark guys about time series data today. If we tie this API to dataframes, we may be able to design a completely generic solution and eventually contribute it directly to Spark. Is there any interest in that?

sryza commented 9 years ago

Awesome!

If it seems like it's a good fit, I'm not opposed to working on and possibly contributing back tight dataframe integration. My opinion is that the more stats-y time series modeling aspects of this project should probably stay outside of Spark, as it's already pretty bloated.

dsdinter commented 9 years ago

I am really interested in implementing ARIMA so I can try to help here where possible.

sryza commented 9 years ago

Hey @dsdinter, glad to hear about the interestt! I know that @josepablocam is currently working on this. Perhaps you can find a way to split up tasks?

josepablocam commented 9 years ago

@dsdinter I jotted out a draft for a non-seasonal arima, but nothing complete enough worth sharing yet, so more than happy to coordinate something jointly. Were you thinking about seasonal/non-seasonal?

dsdinter commented 9 years ago

Hi @josepablocam & @sryza I am actually looking at the non-seasonal one, similarly to Madlib ARIMA implementation (I.e. To forecast Timeseries values): http://doc.madlib.net/master/group__grp__arima.html

josepablocam commented 9 years ago

@dsdinter sorry about the delayed reply. I'm planning on cleaning up the sketch I currently have for the non-seasonal arima tomorrow and will share with you and see how best to go forward.

dsdinter commented 8 years ago

Hi @josepablocam, no worries, looking forward to looking at your sketch and see where and how I can help. We can maybe focus on each of the sections of ARIMA, i.e. AR vs MA terms (Divide and conquer).

dsdinter commented 8 years ago

Actually there is already an AR module in the current package, maybe one should focus on the Differencing section and the other on the MA then.

josepablocam commented 8 years ago

@dsdinter I'll share what I have by EOD. I trying to get the parameters fitted with CSS to match up with the ones in R's stat:arima. I'll post regardless of success though. I think they might be currently off because a) I'm using a different optimization method (math3 commons BOBYQAOptimizer), and b) different initial guess for parameters.

josepablocam commented 8 years ago

@dsdinter I've pushed what I currently have for ARIMA. Current parameter fitting is done using conditional sum of squares, with the math3 BOBYQA optimizer (so no derivative provided). I think a lot of this needs to be reworked but wanted to avoid delaying sharing. You can see what's there so far on the arima branch of my fork. I quickly compared to what results from this stackoverflow question. Seems differences stem from initialization of parameters (along with the optimization method). I'll probably work on cleaning this up and then adding exogenous variables at some point this week.

josepablocam commented 8 years ago

@dsdinter @sryza I reworked what I currently have for the non-seasonal arima. Thinking about it a bit more, I'm not entirely sure exogenous variables should be added to this implementation, since it doesn't seem in keep with the rest of the models in there so far (which are all functions of endogenous variables).

On another note, I've been comparing the parameter estimates vs R's arima, and results seem fine (as long as R's call uses "CSS" as well). The largest deviations tend to be in the intercept term. I'm going to take a closer look to see how R is initializing that.

dsdinter commented 8 years ago

@josepablocam apologies for the delay, I will be looking at this over the weekend as I have been quite busy at work.

Thanks for sharing!

josepablocam commented 8 years ago

@dsdinter no worries, I've been changing a lot of it, so actually probably best that you haven't taken a look yet

dsdinter commented 8 years ago

Not sure if you had the chance but I have been looking at how ARIMA got implemented in Madlib: https://github.com/madlib/madlib https://github.com/madlib/madlib/blob/master/src/modules/tsa/arima.cpp

It's c++ but the mathematical approach is there anyway.

josepablocam commented 8 years ago

@dsdinter @sryza I've pushed what I have so far to my repo at https://github.com/josepablocam/spark-timeseries/tree/arima I also added some tests. I've left removeTimeDependentEffects commented out, since what I was doing doesn't seem right to me, but wasn't clear what the right approach was. I've left the commented out code so you can see what I was doing though.

Tests included: 1 - fitting a time series generated by R's stats:arima.sim should result in relatively close parameters to those used to generate the series 2 - sampling from a model, and then fitting the sampled series should result in a similar model 3 - fitting an ARIMA(p, d, q) to series X should be equivalent to fitting an ARMA(p, q) to a X that has been differenced at order d

Any more test suggestions are of course welcomed.

I will go through the madlib code this week. I haven't gone through it yet.

sryza commented 8 years ago

Mind submitting a pull request for the branch? It's fine if it's not in a final state, it will just be easier to comment on.

josepablocam commented 8 years ago

doneso. Labeled as WIP https://github.com/cloudera/spark-timeseries/pull/40

SupunS commented 8 years ago

Is there any implementation going on for "Seasonal" ARIMA?

sryza commented 8 years ago

Hi @SupunS, there isn't currently anyone working on seasonal ARIMA.

anshulemc commented 8 years ago

Is the ARIMA model available for JAVA?

Vamshi26 commented 8 years ago

Hi @sryza ,I am looking a way out how do we use DataFrame in ARIMA Any suggestions ?

nancylin10 commented 7 years ago

@SupunS, do you know if we will still be implementing s-ARIMA?

elexira commented 7 years ago

please add ARIMAX to spark libraries, please please !