matthewwardrop / formulaic

A high-performance implementation of Wilkinson formulas for Python.
MIT License
346 stars 25 forks source link

When working with time series data, is there a way to use lagged variables? #190

Closed rgriva closed 1 week ago

rgriva commented 3 months ago

Let's say we have time series data and two variables in a Pandas dataframe. Let's say we want to run a regression of some target variable $y_t$ on $xt$ and $y{t-1}$. Is there a way to do something like the following code?

LHS, RHS = model_matrix("y ~ x + lag(y, 1)")

I took a look on the documentation and couldn't find it.

matthewwardrop commented 1 month ago

Hi @rgriva ! We don't have inbuilt support for this, but maybe we should. In any case, you can do:

def lag(x, shift):
    return x.shift(shift)

LHS, RHS = model_matrix("y ~ x  + lag(y, 1)", df)

Apologies for the delay in my response!

matthewwardrop commented 1 month ago

I think I'll go ahead and add this to the standard set of transforms for the next release, since this is a pretty common use-case.

matthewwardrop commented 1 week ago

This has now been added for the next release of Formulaic, which will be 1.1.0, and should be released somewhat soon. Thanks for reaching out @rgriva !