openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

Add NumpyBatch utils from PVNet #287

Closed markus-kreft closed 3 months ago

markus-kreft commented 4 months ago

Pull Request

Description

Transfer functions from PVNet specific to NumpyBatches here. Type hints are very minimal because recursive type support is still quite tricky. I have tried to use a nested dict type NestedDict = dict[Any, Union[T, "NestedDict[T]"]] but have trouble getting mypy to recognize NumpyBatches as such. I would be very interested if someone can teach me how to do this properly.

Fixes https://github.com/openclimatefix/PVNet/issues/111

How Has This Been Tested?

Logic comes for another repository. I have added a simple test in tests/batch/test_utils.py

Checklist:

markus-kreft commented 4 months ago

Hi @dfulu, thank you for the hint. Making the functions Batch-specific was also my first idea, but getting the types right also turns out to be quit difficult with the recursion (in the second call it will not be a NumpyBatch anymore), and allowing multiple types makes trouble when accessing the batch keys.

To enforce the desired types in the public facing API I have two ideas:

  1. Wrap the existing generic function (here _batch_to_tensor is the current function):
    def batch_to_tensor(batch: NumpyBatch) -> TensorBatch:
    return _batch_to_tensor(batch)
  2. Replace the recursion by loops, because we know that a NumpyBatch is at most 3 levels deep. This will lead to a lot of duplicated code.

From this SO post I think turning Batches into TypedDict subclasses could help (though not sure?), but this would mean redoing the batch module. Or maybe someone sees another way?

markus-kreft commented 3 months ago

I have implemented option 1 for now.

dfulu commented 3 months ago

@all-contributors please add @markus-kreft for code

allcontributors[bot] commented 3 months ago

@dfulu

I've put up a pull request to add @markus-kreft! :tada:

markus-kreft commented 3 months ago

Hi James, no worries, I can imagine you have a lot going on at OCF right now. I am happy my approach to typing worked out :-) I will integrate the functions in PVNet.