openclimatefix / PVNet

PVnet main repo
MIT License
15 stars 3 forks source link

Save batches script saves examples rather than number of batches for sites #174

Open Sukh-P opened 2 months ago

Sukh-P commented 2 months ago

Describe the bug

Currently the batch_size is not taken into account when creating batches for sites in the save_batches script, resulting in one example in each netcdf file so this has to be accounted for when creating number of batches (it will actually be number of examples so if batch size is 8 need to create x8 number of examples)

To note this also affects batch creation for WindNet.

To Reproduce

Run the save_batches script but with the renewable variable in the config set to a value such as "pv _india" that uses the pvnet_site_datapipe function

Expected behavior

For each netcdf file ouptutted to train and val folders to have batch_size number of examples in it, rather than one example per netcdf filfe.

Additional context

This can be changed in this repo or by editing the pvnet_site.py file in the pvnet_site_datapipe function in the ocf_datapipes repo to take into batch_size as a parameter. However there would need to be testing that loading batches for model training through this function does not break given multiple examples now in each batch.

peterdudfield commented 2 months ago

Was the conclusiong of this, that its easier to save examples, rather than batches, becasue th eexamples are .netcdf files and batches are numpy file. Do we need to just renmae things from batches to examples to make this clear?

Sukh-P commented 2 months ago

So I don't think this is urgent but I wanted there to be some visibility of this deviation from previous behaviour, it only impacts batch creation/number of examples rather than since batches are recreated from samples using batch size correctly and if you're aware that it's number of samples rather than number of batches it's fine. Renaming would be a little tricky because this is just the site/India PVNet side but we could just add some comments somewhere to make this clearer.

But IMO to keep things consistent with UK PVNet we should make the changes so that batches created are saved with the number of examples specified in the batch size, just will take a bit of work to make sure that doesn't break the conversion into numpy batches that happens when training.