openradar / xradar

A tool to work in weather radar data in xarray
https://docs.openradarscience.org/projects/xradar
MIT License
85 stars 17 forks source link

Migrating mpl reader from ACT to xradar #159

Open zssherman opened 3 months ago

zssherman commented 3 months ago

After the ACT dev call, we discussed on how moving the MPL reader to xradar would be more fitting: https://github.com/ARM-DOE/ACT/issues/806

I can give this a shot. I will just need to learn the backends of Xarray first.

kmuehlbauer commented 3 months ago

@zssherman Great initiative! It looks like MPL is also some binary format with neat header structures.

The sigmet/iris reader heavily uses these kind of structured decoding. This is also what #158 is trying to achieve for nexrad level2.

Maybe we can discuss on next open radar meeting which steps are necessary to get a prototype reader ready.

zssherman commented 3 months ago

@kmuehlbauer That sounds good to me!

kmuehlbauer commented 3 months ago

@zssherman Since there wasn't much time yesterday I'll follow up with some ideas/pointers here.

I'm not really sure how to handle the sidecar files, but we might just search/recognize them and directly read/decode as binary blobs (when without header).

For the main file the idea would be to use np.memmap for easy reading large data. See

https://github.com/openradar/xradar/blob/56a9ca1f42ca23074dc0b1d2a86d51e1bd4eafa2/xradar/io/backends/nexrad_level2.py#L137-L151

Then the header could be directly extracted using the machinery from the iris/sigmet reader:

https://github.com/openradar/xradar/blob/56a9ca1f42ca23074dc0b1d2a86d51e1bd4eafa2/xradar/io/backends/nexrad_level2.py#L187-L190

For this the header structure needs some special layout, where decoding information can be attached into the OrderedDict:

https://github.com/openradar/xradar/blob/56a9ca1f42ca23074dc0b1d2a86d51e1bd4eafa2/xradar/io/backends/nexrad_level2.py#L725-L733

The actual data might be read with dedicated functions (eg names like get_data or similar), which uses header information about file offset, size and dtype. See the following for a (not so nice example):

https://github.com/openradar/xradar/blob/56a9ca1f42ca23074dc0b1d2a86d51e1bd4eafa2/xradar/io/backends/nexrad_level2.py#L576-L608

This get_data function is used in the ArrayWrapper to retrieve the data in a lazy manner, whereas header data is used to provide the information to create the DataArrays/Dataset.

https://github.com/openradar/xradar/blob/56a9ca1f42ca23074dc0b1d2a86d51e1bd4eafa2/xradar/io/backends/nexrad_level2.py#L1227

This is then used in the XarrayStore to provide Variables/Coordinates

https://github.com/openradar/xradar/blob/56a9ca1f42ca23074dc0b1d2a86d51e1bd4eafa2/xradar/io/backends/nexrad_level2.py#L1310

https://github.com/openradar/xradar/blob/56a9ca1f42ca23074dc0b1d2a86d51e1bd4eafa2/xradar/io/backends/nexrad_level2.py#L1328

I hope this does at least make some sense and you could give it a try.