rwijtvliet / portfolyo

Handling timeseries for power and gas retail portfolios.
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

[enhancement] More flexible intersect functions #58

Closed rwijtvliet closed 4 months ago

rwijtvliet commented 9 months ago

Currently, the functions in pf.tools.intersect can only work on series/dataframes/indices which have the same frequency and the same start-of-day. This is not always practical.

Suggested improvement: add boolean keyword arguments (default False) ignore_freq ignore_start_of_day, and ignore_tz.

If ignore_freq == True: let's say we have a with quarters (incl) 2022-04-01 until (excl) 2024-07-01 and b with years (incl) 2021-01-01 until (excl) 2024-01-01. The intersection should return the same time period for each: (incl) 2023-01-01 till (excl) 2024-01-01 - but for a as quarters, and for b as years.

If ignore_start_of_day == True: if we have a and b, both with daily frequency, but a has timestamps (incl) 2022-04-21 00:00 until (excl) 2022-05-10 00:00, and b has 2022-04-25 06:00 until (excl) 2022-05-15 06:00. The intersection should return the 2022-04-25 00:00 until (excl) 2022-05-10 00:00 for a and the 2022-04-25 06:00 until (excl) 2022-05-10 06:00 for b. Note that, after the intersection, there is actually still a part of a not included in b (nl the first 6 hours in a) and a part of b not included in a (nl the final 6h in b).

NB: how to handle the case if the frequency of a and/or b is shorter than daily? In that case, a "strict" intersection could change the start-of-day of one of the input objects. e.g., in the case above, if both have hourly frequency, the strict intersection would mean that the resulting index is the same for both: (incl) 2022-04-25 06:00 until (excl) 2022-05-10 00:00. This should not happen! Both objects should have the same start-of-day as the original inputs do, and both objects should span an integer number of days. So, in this case, the desired result is the same as in the previous paragraph, but with hourly frequency.

If ignore_tz == True: if we have a and b, both with daily frequency, but a has the Europe/Berlin timezone and b has no timezone or the Asia/Kolkata timezone, the intersection is done using "wall time".


(don't forget to include tests for these cases, including combinations where >1 may be set to True)