Index_netcdf uses var_range()in nchelpers to determine the range of a variable.
Sometimes processes output a netCDF file where the _FillValue attribute of a variable is not the Official Fill Attribute. This error is unfortunately common, but surprisingly hard to detect. For example, you can look at an affected file with ncdump:
However, if you get the variable range using var_range, the value of the _FillValue attribute will be included in the range:
>>> from nchelpers import CFDataset
>>> data = CFDataset("pr_RP5_annual_maximum_BCCAQv2+ANUSPLIN300_CanESM2_historical+rcp85_r1i1p1_1961-1990.nc")
>>> data.var_range("rp5pr")
(7.149994, 1e+20)
So when this unfortunately-reasonable-looking file is indexed, the maximum variable value will be 1e+20, which was likely intended to be a fill value, judging from its presence in the _FillValue attribute.
This type of file error is quite hard to detect in advance, since it does not show up on any of the common netcdf-checking tools. It would be wonderful if index_netcdf would print a warning when the following happens:
a variable has a _FillValue attribute, and
the range of the variable, as returned by var_range, include the _FillValue attribute as either a minimum or a maximum.
That is a Bad Data Smell and whoever is indexing probably wants to know! Certainly would save me some headaches.
Index_netcdf
usesvar_range()
in nchelpers to determine the range of a variable.Sometimes processes output a netCDF file where the
_FillValue
attribute of a variable is not the Official Fill Attribute. This error is unfortunately common, but surprisingly hard to detect. For example, you can look at an affected file withncdump
:Or use ncview to look at the file:
You can even look at this file in python:
and all looks reasonable.
However, if you get the variable range using
var_range
, the value of the _FillValue attribute will be included in the range:So when this unfortunately-reasonable-looking file is indexed, the maximum variable value will be 1e+20, which was likely intended to be a fill value, judging from its presence in the
_FillValue
attribute.This type of file error is quite hard to detect in advance, since it does not show up on any of the common netcdf-checking tools. It would be wonderful if
index_netcdf
would print a warning when the following happens:_FillValue
attribute, and_FillValue
attribute as either a minimum or a maximum.That is a Bad Data Smell and whoever is indexing probably wants to know! Certainly would save me some headaches.