twosigma / flint

A Time Series Library for Apache Spark
Apache License 2.0
993 stars 184 forks source link

How to get the latest two events on a specific window using Flint? #85

Open kant777 opened 4 years ago

kant777 commented 4 years ago

How to get the latest two events using this library?

Say I have the following

time    | price
---------------
1000L | 40
2000L | 20
3000L | 80
4000L | 10
5000L | 60
6000L | 30

I want to do operations like

  1. Get the last 2 events or last 4 events in the past 6 hours
  2. Difference in price for the last two events in 6 hours.

How to do this using Flint? I can write my own udf and solve this but I am wondering if there is any inbuilt function that is already available?

dgrnbrg commented 3 years ago

I've done this by leftJoining the data frame to itself shifted forward w/ a tolerance, but that's hideously slow. I think that you could use addPastWindows for 6 hours, and then use a UDF on that.