openradar / xradar

A tool to work in weather radar data in xarray
https://docs.openradarscience.org/projects/xradar
MIT License
85 stars 17 forks source link

open_{engine}_mfdatatree functionality #79

Open kmuehlbauer opened 1 year ago

kmuehlbauer commented 1 year ago

Description

We can already utilize xarray.open_mfdataset to concatenate multiple files into xarray.Dataset along the time dimension. See also #69.

For reading single (volume) files we have xradar's open_{engine}_datatree functions. It would be great if there would be an equivalent open_{engine}_mfdatatree function.

As that would be an enhancement to the CfRadial2/FM301 standard (which only knows single volumes/sweeps) we would need to think about how this could be designed codewise.

mgrover1 commented 1 year ago

I agree - this would be helpful

kmuehlbauer commented 1 year ago

This would be a very minimal solution for concatenating DataTree.

def concat_radar_datatree(objs, dim="volume_time"):
    root_ds = [obj["/"].ds for obj in objs]
    root = xr.concat(root_ds, dim=dim)
    dtree = DataTree(data=root, name="root")
    for grp in objs[0].groups:
        ngrps = [obj[grp[1:]].ds for obj in objs]
        ngrp = xr.concat(ngrps, dim=dim)
        DataTree(ngrp, name=grp[1:], parent=dtree)
    return dtree

There might be more fitting solutions, but this is as far I've come. With that we could read individual files and concatenate the trees after.

Or another solution would be to concatenate the matching groups using open_mfdataset and create the tree afterwards.

And of course, we can nice and convenient select different timesteps:

ctree = concat_radar_datatree([dtree1, dtree2])
dtree3 = ctree.isel(volume_time=0)

Not sure if datatree has some of this already build in. I

egouden commented 1 year ago

I see a generic xradar.open taking input files with unknown organization.

I think the DataTree object with root and sweeps is the natural implementation for our model, and therefore reading and writing FM301 files. It can be used for one sweep, one volume or multiple volumes.

As the volumes might not be consistent, the default return would be a list of volumes/DataTree. The volumes can be constructed easily from FM301 mandatory metadata. Otherwise the volume cycle should be provided by the user.

There would be options to get data as "sweep by sweep" or as multiple volumes with extra dimension.

By default all observed variables would be regrouped if this is not already the case