openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

Order nwp channels #326

Open peterdudfield opened 3 months ago

peterdudfield commented 3 months ago

Detailed Description

Should we add a pipeline that orders the nwp channels alphabetically

Context

Possible Implementation

AUdaltsova commented 3 months ago

I looked at where channel selection happens and I think this can be achieved via a one-liner in ocf_datapipes/select/filter_channels.py (just sort the channel list before performing selection, it should return it in the right order after that, including coord reordering)

peterdudfield commented 3 months ago

Thats a good idea

peterdudfield commented 3 months ago

It would be interested to know if if sel just selected the channels, or it selects them and orders them https://github.com/openclimatefix/ocf_datapipes/blob/main/ocf_datapipes/select/filter_channels.py#L49

by trying d.sel({"variable": ["lcc", "mcc"]}) and d.sel({"variable": ["mcc", "lcc"]}) we do seem to get different results

AUdaltsova commented 3 months ago

Yes, that's what I was basing the one-liner suggestion on: sel seems to reorder coordinates, so the order of channels depends on the order in which somebody adds them into the config, and hence is very prone to inconsistencies.

I am trying to see if I can find a solid description of the reordering somewhere in the docs

dfulu commented 3 months ago

I'm not sure I follow what the issue is here.

If using the filter channels function which we use for example here then we will have the same channel ordering in training and production, even if the dataset we are selecting from has them in a different order on disk. So long as the input data config remains the same at training and production. I think this is the desired behaviour?

dfulu commented 3 months ago

It would be interested to know if if sel just selected the channels, or it selects them and orders them https://github.com/openclimatefix/ocf_datapipes/blob/main/ocf_datapipes/select/filter_channels.py#L49

by trying d.sel({"variable": ["lcc", "mcc"]}) and d.sel({"variable": ["mcc", "lcc"]}) we do seem to get different results

Yes, this is how the selection works. It reorders the channels based on list. The first one has the channels in order lcc then mcc. The second has them in the order mcc then lcc

AUdaltsova commented 3 months ago

Yeah as long as the same config file is used it should be completely fine, I wasn't sure if that's what's happening