openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

parameter change in `select_time_slice` #165

Closed dfulu closed 1 year ago

dfulu commented 1 year ago

Replace or supplement the history_duration and forecast_duration parameters with interval_start and interval_end

Detailed Description

In writing some pipelines for PVNet which include dropout and assuming data latency, I've come across some use cases which are made slightly unclear by the current parameters.

Examples

  1. In production we can expect a delay of at least 15 minutes in satellite data, so I'd like to build a pipeline with satellite data which is between 60 and 15 minutes before t0.

    • Currently this can be done by setting
      • history_duration=timedelta(minutes=30)
      • forecast_duration=timedelta(minutes=-15).
    • The alternative could be
      • interval_start=timedelta(minutes=-60)
      • interval_end=timedelta(minutes-15)
  2. For the dropout pipelines, I'd like to slice the data into future and historical sections so that I can apply dropout on the historical section as inputs but keep the future data clean. I want to select the future GSP data from one step ahead of t0 (i.e. 30 minutes) until 180 minutes into the future.

    • Currently this can be done with
      • history_duration=timedelta(minutes=-30)
      • forecast_duration=timedelta(minutes=180)
    • The alternative could be
      • interval_start=timedelta(minutes=30)
      • interval_end=timedelta(minutes=180)

Possible Implementation

Either replace the parameters history_duration and forecast_duration parameters with interval_start and interval_end, or add interval_start and interval_end as optional parameters and allow either *_duration or interval_* to be set.