openclimatefix / nowcasting_dataset

Prepare batches of data for training machine learning solar electricity nowcasting data
https://nowcasting-dataset.readthedocs.io/en/stable/
MIT License
24 stars 6 forks source link

Run validation script at the end of `prepare_ml_data.py`? #317

Open JackKelly opened 2 years ago

JackKelly commented 2 years ago

Detailed Description

Maybe we should always validated the on-disk batches?

(Let's wait for PR #300 to be merged before working on this)

peterdudfield commented 2 years ago

The validation script is 'abit' / 'a lot' out of data. Itll need some work to update. The good thing is the Batch validates each data source as we go.

Perhaps a easy cahnge to make, would be to validate the t0_datetimes are in sperate groups for the train, validation and test

JackKelly commented 2 years ago

validate the t0_datetimes are in sperate groups for the train, validation and test

I completely agree! I think I implemented this here: https://github.com/openclimatefix/nowcasting_dataset/blob/main/nowcasting_dataset/dataset/split/split.py#L189