openradar / xradar

A tool to work in weather radar data in xarray
https://docs.openradarscience.org/projects/xradar
MIT License
85 stars 17 forks source link

Rainbow reader: _FillValue and _Undetect #103

Open egouden opened 1 year ago

egouden commented 1 year ago

Description

In the Rainbow reader, _FillValue is set to 0 and _Undetect is not set. This is causing a loss of information.

Normally _Fillvalue should be set to 255 (or the max value for the byte encoding) and _Undetect to 0.

WMO CF Extensions:

mgrover1 commented 1 year ago

Thanks for letting us know! Contributions are always welcome :)

kmuehlbauer commented 1 year ago

Sorry for the late response, that went somehow under the radar.

Rainbow5 (at least Release 5.57.0) does only have a notion of "no data" which is best translated to _FillValue (it's zero for all radar moments on disk). There is no _Undetect comparable value in Rainbow5.

For instance a one byte reflectivity looks like this:

0x00 (0) no data 0x01 (1) -31.5 0x02 (2) -31.0 0x40 (64) 0.0 0x80 (128) 32.0 0xC0 (192) 64.0 0xFE (254) 95.0 0xFF (255) 95.5

We can't do much about the source format.

kmuehlbauer commented 1 year ago

And I'm not considering that a bug, as specifying the _FillValue is sufficient for the CfRadial2 standard. ODIM_H5 has nodata as well as undetect mentioned, but if I interpret the documentation correctly both are not mandatory values.

mgrover1 commented 1 year ago

Ahh okay! Sorry for the mislabelling here @kmuehlbauer , I appreciate the insight!

kmuehlbauer commented 1 year ago

No worries @mgrover1, we should invent a label which can be put there instead. How about data-format-issue?

mgrover1 commented 1 year ago

how about data-standards?

egouden commented 1 year ago

This is a known tricky issue.

1) _Undetect

It is an optional attribute in FM301. It is defined as follow: "Indicates an area (range bin) that has been radiated but has not produced a valid echo".

It is a mandatory metadata in ODIM. It is defined as "Raw value used to denote areas below the measurement detection threshold (radiated but nothing detected)".

By convention, this special kind of value is often coded as zero.

In the documentation of Rainbow it is described as "no data". But this is misleading. It actually corresponds to _Undetect. It should not be considered as "missing data" using _FillValue.

2) _FillValue

This is defined in CFRADIAL as "Indicates data are missing at this range bin" This has apparently been dropped in FM301 in favor of flag_values.

In ODIM 2.4 you have the similar mandatory (if applicable) NODATA, defined as "raw value used to denote areas void of data (never radiated)". In OPERA, it is also used when bad data have been removed.

There is actually no "missing value" in the Rainbow format. I was wrong here. So _FillValue should simply not be set.

I hope this clarifies my point of view.

kmuehlbauer commented 1 year ago

@egouden Great explanation. It's definitely confusing. The major pain point here is the netcdf standard which only knows _FillValue but no nodata or undetect whatsoever. It has a notion of missing_value, though.

So as default, when creating a netcdf variable you can specify a fill_value. If you leave it out, netcdf will take it's default value for that datatype as fill_value but it will not be attached to the variable.

It looks like we have to make sure we use apply the correct metadata etc on writing the different standards.

egouden commented 1 year ago

For UBYTE, the default _FillValue is 255. If this is a valid maximum value for a given format (e.g. Rainbow), one could replace 255 by 254 while setting 255 as _FillValue for clarity.

Note that a 255 valid value is most probably not meteorological. And it might be reset to 255 later by post processing.

egouden commented 1 year ago

This is actually a more general issue on the definition of special values in our data model. Is missing value "undetect" or "no valid data" or both? I think this is not clear in FM301. I will ask the OPERA ODIM expert for clarification.

kmuehlbauer commented 1 year ago

We will need to be careful here at implementation time, as xarray maps everything to np.nan (_FillValue, missing_value) but it is not able to tell that apart when writing.

I've also added links to the relevant WMO CF Extensions in the OP above.

kmuehlbauer commented 1 year ago

WMO_EXT_FillValue

This is an excerpt from the WMO CF Extensions. The only mandatory attribute in our context is valid_range, providing smallest and largest valiud values.

_FillValue is denoting missing or undefined data and is not mandatory (but it might be good practice to use the netcdf default fill_value here, if no other value is provided).

syedhamidali commented 8 months ago

Based on the this Open Radar Discourse thread it seems that we need to add more metadata to the rainbow reader, something pretty similar to what is in this function:

def add_metadata_to_rainbow(dtree):
    data = np.array('axis_z', dtype='|S32')
    attributes = {
        'standard_name': 'primary_axis_of_rotation',
        'options': 'axis_z, axis_y, axis_x'
    }
    # Create the xarray DataArray
    dtree['primary_axis'] = xr.DataArray(data, attrs=attributes, dims=())

    fixed_angles = []
    for grp in dtree.groups:
        if "sweep" in grp:
            fixed_angle = dtree[grp]['sweep_fixed_angle'].values
            fixed_angles.append(fixed_angle)
            dtree[grp]['sweep_mode'] = xr.DataArray(
                np.array(dtree[grp]['sweep_mode'], dtype='|S32'),
                attrs={'standard_name': 'scan_mode_for_sweep',
     'options': 'sector, coplane, rhi, vertical_pointing, idle, azimuth_surveillance, \
     elevation_surveillance, sunscan, pointing, calibration, manual_ppi, manual_rhi'})

    fixed_angles = np.array(fixed_angles, dtype=np.float32)
    sweep_groups = np.array([f'sweep_{i}' for i in range(fixed_angles.size)])
    dtree['sweep_fixed_angle'] = xr.DataArray(fixed_angles, dims=('sweep'))
    dtree['sweep_group_name'] = xr.DataArray(sweep_groups, dims=('sweep'))
    dtree['volume_number'] = xr.DataArray(np.array(0, dtype=np.int16),
                                      attrs={'standard_name':'data_volume_index_number'})
    dtree['instrument_type'] = xr.DataArray(np.array('radar', dtype='|S32'),
                                      attrs={'standard_name': 'type_of_instrument',
                                             'options': 'radar, lidar',
                                             'meta_group': 'instrument_parameters'}
                                       )
    dtree['platform_type'] = xr.DataArray(
        np.array('fixed', dtype='|S32'),
        attrs={'standard_name': 'platform_type',
    'options':'fixed, vehicle, ship, aircraft_fore, aircraft_aft, aircraft_tail, aircraft_belly,\
    aircraft_roof, aircraft_nose, satellite_orbit, satellite_geostat'})

    return dtree