openclimatefix / ocf-data-sampler

A test repo to experiment refactoring ocf_datapipes
MIT License
1 stars 1 forks source link

enforce dataArray as a type in nwp providers? #32

Closed AUdaltsova closed 2 weeks ago

AUdaltsova commented 1 month ago

Detailed Description

Responding to this TODO in ecmwf provider:

# TODO: should we control the dtype of the DataArray?

I think having "channel" dimension basically does that for you? All providers are guarnteed to have it, and it guarantees that the variables are flattened into a DA

dfulu commented 1 month ago

By this comment I actually mean should we lock into np.float32 or np.float64 etc of the values in the DataArray.

Right now I'm leaning towards we shouldn't, although controlling the dtype has bitten us before. For example, the training UKV data is float16 which means that when we have visibility with a value higher than 2*16-1 ~= 65000 it was overflowing to infinity. The visibility is measured in metres and can be more than 65km often enough. We didn't realise this when preparing the training data. Its the reason for this awful hack in PVNet.

In production we had the UKV data in a different format which wasn't overflowing. Therefore we had to hack that into the app as well to resave the data as float16

dfulu commented 2 weeks ago

change to not planned