pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.7k stars 17.93k forks source link

ENH: Enhancement, new functionality request: Irregular time exponential smoothing #10579

Open azuric opened 9 years ago

azuric commented 9 years ago

Not sure if it is OK to make the request here, but here you go.

Can a feature be added to exponential smoothing where

alpha = time decay = (time_now - time_previous)/time_scale;

where time_scale and the difference are given in a specific unit eg milli/micro/nanoseconds.

This would really speed up temporally irregular time series analysis. I do understand that there are many ways to derive alpha in irregular time series but this one is I hope reasonably generic neat feature.

jreback commented 9 years ago

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.ewma.html?highlight=ewma#pandas.ewma

So you actually want to change alpha each iteration? You should show an example of what you are expecting (input and output)

azuric commented 9 years ago

So you actually want to change alpha each iteration?

Yes that is exactly correct ewma can be thought of as a special case where time is uniform.

Thinking a little more it might be a better solution to create a weighted moving average where you can supply the alpha from another column, that way we can generalise the weighted moving average approach to any form. My time decay alpha could be an example but indeed it could realistically be anything.

Functionally I was originally asking about this which is what I do in real time using fast on-line algorithm implementation in C++ but in python this would be equivalent:

def ema( x, time_now, time_previous, time_scale): time_decay = ((time_now - time_previous)/time_scale) value = value + time_decay * (x - ema) return value;

The reason I am posting here is I noticed the pandas.ewma function is very quick/optimised (vectorised or cythonised possibly) so was hoping the same could be done.

jreback commented 9 years ago

pls show an actual example ,e.g. input data and output pls.

max-sixty commented 9 years ago

We've thought about this before - I think it's a good idea. There's lots of room for growth in this area (moving averages / smoothing / changes) generally.

@azuric it looks like you've thought about this already - I would write up the code and get people using & testing it, either as a Gist or a PR...

azuric commented 9 years ago

That sounds like a great idea. I will build up an example of some of the structures I tend to use like @jreback has requested and also a couple of ideas I have.

jreback commented 9 years ago

http://oroboro.com/irregular-ema/ looks like a reference for this

azuric commented 9 years ago

Hi,

That equation doesn't look correct to me. It looks like he is interpolating or something when it should be surely:

weight = Exp(deltaTime/alpha)

ema += weight*(signal - ema)

Correct me if I am wrong.

Regards

On Mon, Aug 10, 2015 at 12:05 PM, Jeff Reback notifications@github.com wrote:

http://oroboro.com/irregular-ema/ looks like a reference for this

— Reply to this email directly or view it on GitHub https://github.com/pydata/pandas/issues/10579#issuecomment-129408659.

Akhil Patel

azuric commented 9 years ago

edit: weight = Exp(-deltaTime/alpha)

On Wed, Aug 12, 2015 at 10:04 AM, akhil patel akhil.halo@gmail.com wrote:

Hi,

That equation doesn't look correct to me. It looks like he is interpolating or something when it should be surely:

weight = Exp(deltaTime/alpha)

ema += weight*(signal - ema)

Correct me if I am wrong.

Regards

On Mon, Aug 10, 2015 at 12:05 PM, Jeff Reback notifications@github.com wrote:

http://oroboro.com/irregular-ema/ looks like a reference for this

— Reply to this email directly or view it on GitHub https://github.com/pydata/pandas/issues/10579#issuecomment-129408659.

Akhil Patel

Akhil Patel

evanpw commented 9 years ago

I think that should be ema += (1 - weight) * (signal - ema)

Or equivalently: ema = weight * ema + (1 - weight) * signal

With your formula, if deltaTime = 0, then ema = signal, which is not what you want.

Another way to think about this is that if you interpolate an irregular time series so that it becomes regular (e.g, take the grid size to be the gcd of the time steps), then this irregular EMA becomes the usual kind.

azuric commented 9 years ago

yes you are correct my mistake. 1-weight works correctly

On Sun, 16 Aug 2015 at 14:15 Evan Wright notifications@github.com wrote:

I think that should be ema += (1 - weight) * (signal - ema)

Or equivalently: ema = weight * ema + (1 - weight) * signal

With your formula, if deltaTime = 0, then ema = signal, which is not what you want.

I also use this kind of thing all the time, though I usually find the version that replaces signal with prevSignal more useful. For example, if you're computing a time-weighted average price of something from a stream of quotes, at the exact time of a new quote its contribution to the average should be zero.

— Reply to this email directly or view it on GitHub https://github.com/pydata/pandas/issues/10579#issuecomment-131548968.

azuric commented 9 years ago

Hi Evan,

Can you elaborate why you think prevSignal is more useful. Is that not introducing lag to the system?

Regards

On Mon, 17 Aug 2015 at 10:50 akhil patel akhil.halo@gmail.com wrote:

yes you are correct my mistake. 1-weight works correctly

On Sun, 16 Aug 2015 at 14:15 Evan Wright notifications@github.com wrote:

I think that should be ema += (1 - weight) * (signal - ema)

Or equivalently: ema = weight * ema + (1 - weight) * signal

With your formula, if deltaTime = 0, then ema = signal, which is not what you want.

I also use this kind of thing all the time, though I usually find the version that replaces signal with prevSignal more useful. For example, if you're computing a time-weighted average price of something from a stream of quotes, at the exact time of a new quote its contribution to the average should be zero.

— Reply to this email directly or view it on GitHub https://github.com/pydata/pandas/issues/10579#issuecomment-131548968.

evanpw commented 9 years ago

Suppose that you have a signal which has value x_1 for some time interval [0, T], and then changes to a value x_2 at time T. If you use signal in the EMA formula instead of prevSignal, then the period T determines the weight that you place on x_2, even though T is the length of time that the signal stayed at x_1.

For an even better example, if the signal then changed to a value x_3 at time T + epsilon, then x_2 would get a much higher weight than x_3, even though the signal stayed at that value only for an infinitesimal length of time.

azuric commented 9 years ago

That makes perfect sense, very well thought out point!

Thanks

On Wed, 19 Aug 2015 at 15:42 Evan Wright notifications@github.com wrote:

Suppose that you have a signal which has value x_1 for some time interval [0, T], and then changes to a value x_2 at time T. If you use signal in the EMA formula instead of prevSignal, then the period T determines the weight that you place on x_2, even though T is the length of time that the signal stayed at x_1.

For an even better example, if the signal then changed to a value x_3 at time T + epsilon, then x_2 would get a much higher weight than x_3, even though the signal stayed at that value only for an infinitesimal length of time.

— Reply to this email directly or view it on GitHub https://github.com/pydata/pandas/issues/10579#issuecomment-132623342.