openclimatefix / nowcasting_dataset

Prepare batches of data for training machine learning solar electricity nowcasting data
https://nowcasting-dataset.readthedocs.io/en/stable/
MIT License
24 stars 6 forks source link

When the training target is half-hourly GSP-level PV, tell the model how many minutes into the future each target is #146

Open JackKelly opened 2 years ago

JackKelly commented 2 years ago

Detailed Description

Up until the recent introduction of GSP-level PV into nowcasting_dataset, we've been dealing entirely with 5-minutely datasets.

GSP-level PV is (for now, at least) half-hourly.

But, if I've understood correctly, the code still produces examples which start at 5-minute increments past the hour (i.e. at {0, 5, ..., 55} minutes past the hour). But, because the GSP data is at half-hour increments, the gap between the most recent satellite image (the image at t0) and the first target (GSP-level PV) might be 5, 10, 15, 20, 25 or 30 minutes into the future.

This is fine for fully-attentional models because they'll know what the gap is from the datetime encoding for each input row and on each query.

But CNN models don't get datetime information for the target. So we should probably tell the model whether the gap between t0 and the first GSP-level PV target is 5, 10, 15, 20, 25, or 30 minutes into the future.

Possible Implementation

This information should be very quick to compute, and isn't necessary for all models, so maybe this should be implemented in the 'thin wrapper' which loads the batches from disk, rather than writing this information to disk?

peterdudfield commented 2 years ago

for ref see https://github.com/openclimatefix/nowcasting_dataset/blob/main/nowcasting_dataset/dataset/datamodule.py#L394

But i think https://github.com/openclimatefix/nowcasting_dataset/blob/main/nowcasting_dataset/dataset/datamodule.py#L436 Gets rid of any non-30 mins datetimes

JackKelly commented 2 years ago

Cool, if we're sure that the code only lets through datetimes at 00 or 30 minutes past the hour, then let's close this issue. Nice!

peterdudfield commented 2 years ago

Perhaps its worth a quick check on one batch

JackKelly commented 2 years ago

Sounds good! Do you mean a quick check to make sure that computing datetime features on the fly is fast enough?

peterdudfield commented 2 years ago

sorry i meant, lets load up a batch and just check the t0_dt, that they are all 0 or 30 mins.

perhaps I've got confused here

JackKelly commented 2 years ago

Oops, sorry, I'm an idiot! I was getting confused with another issue!

Sounds good!