sryza / spark-timeseries

A library for time series analysis on Apache Spark
Apache License 2.0
1.19k stars 424 forks source link

are there any plans to build kalman filtering and system identification into this framework? #19

Open jovo opened 9 years ago

sryza commented 9 years ago

Kalman filtering is definitely something I'd like to add, though it's probably not near the top of the list at this moment. I'm not very familiar with system identification. Curious to hear what you'd be interested in using these for and what kind of data sizes you're dealing with?

jovo commented 9 years ago

functional MRI data. ~ 100,000 observation dimensions, ~ 1,000 time steps. i have a new manuscript that we will be submitting shortly describing an algorithm that we have implemented in matlab to deal with such data. so, i am now looking for the right way to scale it up so that it takes minutes rather than hours to run. i can send you arxiv link when we post it (hopefully this week)....

On Sun, Apr 26, 2015 at 12:34 AM, Sandy Ryza notifications@github.com wrote:

Kalman filtering is definitely something I'd like to add, though it's probably not near the top of the list at this moment. I'm not very familiar with system identification. Curious to hear what you'd be interested in using these for and what kind of data sizes you're dealing with?

— Reply to this email directly or view it on GitHub https://github.com/cloudera/spark-timeseries/issues/19#issuecomment-96322758 .

the glass is all full: half water, half air. openconnecto.me, jovo.me, office hours https://www.google.com/calendar/embed?src=e2ktu4lrgul8anp8hclrcminp8%40group.calendar.google.com&ctz=America/New_York

sryza commented 9 years ago

Awesome, those data dimensions seem like a good fit. Happy to discuss what this would look like on Spark in more depth if it would be helpful.

jovo commented 9 years ago

likely :) are you familiar at all with https://github.com/thunder-project/thunder i don't quite yet understand the spark landscape...

On Mon, Apr 27, 2015 at 1:19 PM, Sandy Ryza notifications@github.com wrote:

Awesome, those data dimensions seem like a good fit. Happy to discuss what this would look like on Spark in more depth if it would be helpful.

— Reply to this email directly or view it on GitHub https://github.com/cloudera/spark-timeseries/issues/19#issuecomment-96747691 .

the glass is all full: half water, half air. openconnecto.me, jovo.me, my calendar https://www.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

sryza commented 9 years ago

I am. Thunder has fairly similar goals to this project. The full set of reasons for starting something new vs. extending that are probably too long for a github-issue discussion, but the main differences are that Thunder is in Python and directed more toward neuroscience, while this is in Scala and directed more toward finance.

jovo commented 9 years ago

i see. i'm looking for a distributed kalman filter and system identification implementation so that my students can modify something existing, rather than start from scratch. seems like neither of you guys have such a thing yet. i'm guessing you don't know of anybody else that does? if not, my guess is we will do it from scratch, likely in python; because of that, and because it is primarily a neuroscience application, likely in Thunder. but please do let me know if you have a better idea :)

On Tuesday, April 28, 2015, Sandy Ryza notifications@github.com wrote:

I am. Thunder has fairly similar goals to this project. The full set of reasons for starting something new vs. extending that are probably too long for a github-issue discussion, but the main differences are that Thunder is in Python and directed more toward neuroscience, while this is in Scala and directed more toward finance.

— Reply to this email directly or view it on GitHub https://github.com/cloudera/spark-timeseries/issues/19#issuecomment-97252609 .

the glass is all full: half water, half air. openconnecto.me, jovo.me, office hours https://www.google.com/calendar/embed?src=e2ktu4lrgul8anp8hclrcminp8%40group.calendar.google.com&ctz=America/New_York

debasish83 commented 6 years ago

I am interested to add KalmanFilter and more state space algorithms like RNN for time series....Let me know if you are still looking...

jovo commented 6 years ago

would be cool, but probably not so useful for me anymore.

On Sun, Dec 10, 2017 at 4:53 PM, Debasish Das notifications@github.com wrote:

I am interested to add KalmanFilter and more state space algorithms like RNN for time series....Let me know if you are still looking...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sryza/spark-timeseries/issues/19#issuecomment-350584612, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjciow8srkuXSFCKa7MlGkm-tgngRkks5s_FLRgaJpZM4EIx8P .

-- the glass is all full: half water, half air. neurodata.io ps - i am committed to responding to my emails, it often takes about a week. thank you for understanding.

schlichtanders commented 6 years ago

here is definitely a lot of interest in these models! is there any progress on implementing these?

cjnolet commented 6 years ago

State space & Kalman filter 100%. This should definitely be added to a timeseries package. I've achieved so much more with this algorithm for customers in terms of practical solutions recently than I have with anything else.