Closed spencerahill closed 4 years ago
I bring this up because our tests failed with xarray 0.14.1 with that option included, but omitting it everything passes.
Huh. That's surprising to me. Out of curiosity, do you have an example test failure or traceback?
@spencerkclark do you happen to remember why we did this?
If I recall correctly, our reason for using format='NETCDF3_64BIT'
was that ncview
was outdated on the GFDL PP/AN cluster and could not read netCDF files with more modern formats (#69). The SciPy netCDF I/O backend only supports NETCDF3 files, and so uses that format automatically; in the case of the netcdf4 backend, we need to specify it with a kwarg.
If I recall correctly, our reason for using format='NETCDF3_64BIT' was that ncview was outdated on the GFDL PP/AN cluster and could not read netCDF files with more modern formats (#69). T
That's right. Thanks! Not a compelling reason to keep using this old format (right?).
do you have an example test failure or traceback?
Sure:
(aospy-dev) [xarray-0.14.1 !]shill@shillbook:~/Dropbox/py/aospy/aospy/test$ pytest test_calc_basic.py
==================================================================== test session starts =====================================================================
platform darwin -- Python 3.7.2, pytest-5.0.1, py-1.8.0, pluggy-0.13.0
rootdir: /Users/shill/Dropbox/py/aospy, inifile: setup.cfg
plugins: cov-2.6.1
collected 170 items
test_calc_basic.py FFFFF^CFFFFF^C
========================================================================== FAILURES ==========================================================================
_________________________________________________________ test_annual_mean[datetime-basic-None-None] _________________________________________________________
test_params = {'date_range': (datetime.datetime(4, 1, 1, 0, 0), datetime.datetime(6, 12, 31, 0, 0)), 'dtype_in_time': 'ts', 'dtype_in_vert': None, 'dtype_out_vert': None, ...}
def test_annual_mean(test_params):
calc = Calc(intvl_out='ann', dtype_out_time='av', **test_params)
> calc.compute()
test_calc_basic.py:110:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../calc.py:449: in compute
save_files=True, write_to_tar=write_to_tar)
../calc.py:528: in save
self._save_files(data, dtype_out_time)
../calc.py:468: in _save_files
data_out.to_netcdf(path, engine='netcdf4', format='NETCDF3_64BIT')
/Users/shill/miniconda3/envs/aospy-dev/lib/python3.7/site-packages/xarray/core/dataset.py:1547: in to_netcdf
invalid_netcdf=invalid_netcdf,
/Users/shill/miniconda3/envs/aospy-dev/lib/python3.7/site-packages/xarray/backends/api.py:1073: in to_netcdf
dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
/Users/shill/miniconda3/envs/aospy-dev/lib/python3.7/site-packages/xarray/backends/api.py:1119: in dump_to_store
store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
/Users/shill/miniconda3/envs/aospy-dev/lib/python3.7/site-packages/xarray/backends/common.py:293: in store
variables, attributes = self.encode(variables, attributes)
/Users/shill/miniconda3/envs/aospy-dev/lib/python3.7/site-packages/xarray/backends/common.py:383: in encode
variables = {k: self.encode_variable(v) for k, v in variables.items()}
/Users/shill/miniconda3/envs/aospy-dev/lib/python3.7/site-packages/xarray/backends/common.py:383: in <dictcomp>
variables = {k: self.encode_variable(v) for k, v in variables.items()}
/Users/shill/miniconda3/envs/aospy-dev/lib/python3.7/site-packages/xarray/backends/netCDF4_.py:436: in encode_variable
variable = encode_nc3_variable(variable)
/Users/shill/miniconda3/envs/aospy-dev/lib/python3.7/site-packages/xarray/backends/netcdf3.py:82: in encode_nc3_variable
attrs = encode_nc3_attrs(var.attrs)
/Users/shill/miniconda3/envs/aospy-dev/lib/python3.7/site-packages/xarray/backends/netcdf3.py:72: in encode_nc3_attrs
return {k: encode_nc3_attr_value(v) for k, v in attrs.items()}
/Users/shill/miniconda3/envs/aospy-dev/lib/python3.7/site-packages/xarray/backends/netcdf3.py:72: in <dictcomp>
return {k: encode_nc3_attr_value(v) for k, v in attrs.items()}
/Users/shill/miniconda3/envs/aospy-dev/lib/python3.7/site-packages/xarray/backends/netcdf3.py:65: in encode_nc3_attr_value
value = coerce_nc3_dtype(np.atleast_1d(value))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
arr = array([-9223372036854775808])
def coerce_nc3_dtype(arr):
"""Coerce an array to a data type that can be stored in a netCDF-3 file
This function performs the following dtype conversions:
int64 -> int32
bool -> int8
Data is checked for equality, or equivalence (non-NaN values) with
`np.allclose` with the default keyword arguments.
"""
dtype = str(arr.dtype)
if dtype in _nc3_dtype_coercions:
new_dtype = _nc3_dtype_coercions[dtype]
# TODO: raise a warning whenever casting the data-type instead?
cast_arr = arr.astype(new_dtype)
if not (cast_arr == arr).all():
raise ValueError(
> f"could not safely cast array from dtype {dtype} to {new_dtype}"
)
E ValueError: could not safely cast array from dtype int64 to int32
/Users/shill/miniconda3/envs/aospy-dev/lib/python3.7/site-packages/xarray/backends/netcdf3.py:53: ValueError
-------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------
INFO:root:Getting input data: Var instance "condensation_rain" (Tue Dec 10 16:48:01 2019)
WARNING:root:Datapoints were stored using the np.float32 datatype.For accurate reduction operations using bottleneck, datapoints are being cast to the np.float64 datatype. For more information see: https://github.com/pydata/xarray/issues/1346
INFO:root:Computing timeseries for 0004-01-01 00:00:00 -- 0006-12-31 00:00:00.
INFO:root:Applying desired time-reduction methods. (Tue Dec 10 16:48:02 2019)
INFO:root:Writing desired gridded outputs to disk.
--------------------------------------------------------------------- Captured log call ----------------------------------------------------------------------
INFO root:calc.py:299 Getting input data: Var instance "condensation_rain" (Tue Dec 10 16:48:01 2019)
WARNING root:data_loader.py:135 Datapoints were stored using the np.float32 datatype.For accurate reduction operations using bottleneck, datapoints are being cast to the np.float64 datatype. For more information see: https://github.com/pydata/xarray/issues/1346
INFO root:calc.py:439 Computing timeseries for 0004-01-01 00:00:00 -- 0006-12-31 00:00:00.
INFO root:calc.py:423 Applying desired time-reduction methods. (Tue Dec 10 16:48:02 2019)
INFO root:calc.py:443 Writing desired gridded outputs to disk.
Thanks! I agree this is not something to get too hung up on in aospy. I think it's totally reasonable to start writing files out in a modern format. I do wonder though if this would be considered a bug in xarray.
A git bisect session suggests this is the commit in xarray that was the culprit. Let me see if I can come up with a minimal working example.
$ git bisect good
eece07932d5498a8abef6a8fbd30d00066931b18 is the first bad commit
commit eece07932d5498a8abef6a8fbd30d00066931b18
Author: Anderson Banihirwe <axbanihirwe@ualr.edu>
Date: Wed Nov 13 18:22:50 2019 -0700
Harmonize `FillValue` and `missing_value` during encoding and decoding steps (#3502)
* Replace `equivalent()` with `allclose_or_equiv()`
* Ensure _FillValue & missing_value are cast to same dtype as data's
* Use Numpy scalar during type casting
* Update ValueError message
* Formatting only
* Update whats-new.rst
doc/whats-new.rst | 2 ++
xarray/coding/variables.py | 14 ++++++++++----
xarray/tests/test_coding.py | 17 +++++++++++++++++
3 files changed, 29 insertions(+), 4 deletions(-)
I think I found a minimal example; this gives a different error, but I think it's a symptom of the same problem (in fact you don't even need to specify the old file format in this case):
In [1]: import numpy as np; import pandas as pd; import xarray as xr
In [2]: times = pd.date_range('2000', periods=3)
In [3]: da = xr.DataArray(times, dims=['a'], coords=[[1, 2, 3]], name='foo')
In [4]: da.encoding['_FillValue'] = 1.0e20
In [5]: da.encoding['dtype'] = np.dtype('float64')
In [6]: da.to_dataset().to_netcdf('test.nc', format='NETCDF3_64BIT')
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-6-736fea18f4cd> in <module>
----> 1 da.to_dataset().to_netcdf('test.nc', format='NETCDF3_64BIT')
~/Software/xarray/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
1548 unlimited_dims=unlimited_dims,
1549 compute=compute,
-> 1550 invalid_netcdf=invalid_netcdf,
1551 )
1552
~/Software/xarray/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
1071 # to be parallelized with dask
1072 dump_to_store(
-> 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
1074 )
1075 if autoclose:
~/Software/xarray/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
1117 variables, attrs = encoder(variables, attrs)
1118
-> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
1120
1121
~/Software/xarray/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
291 writer = ArrayWriter()
292
--> 293 variables, attributes = self.encode(variables, attributes)
294
295 self.set_attributes(attributes)
~/Software/xarray/xarray/backends/common.py in encode(self, variables, attributes)
380 # All NetCDF files get CF encoded by default, without this attempting
381 # to write times, for example, would fail.
--> 382 variables, attributes = cf_encoder(variables, attributes)
383 variables = {k: self.encode_variable(v) for k, v in variables.items()}
384 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}
~/Software/xarray/xarray/conventions.py in cf_encoder(variables, attributes)
758 _update_bounds_encoding(variables)
759
--> 760 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
761
762 # Remove attrs from bounds variables (issue #2921)
~/Software/xarray/xarray/conventions.py in <dictcomp>(.0)
758 _update_bounds_encoding(variables)
759
--> 760 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
761
762 # Remove attrs from bounds variables (issue #2921)
~/Software/xarray/xarray/conventions.py in encode_cf_variable(var, needs_copy, name)
248 variables.UnsignedIntegerCoder(),
249 ]:
--> 250 var = coder.encode(var, name=name)
251
252 # TODO(shoyer): convert all of these to use coders, too:
~/Software/xarray/xarray/coding/variables.py in encode(self, variable, name)
163 if fv is not None:
164 # Ensure _FillValue is cast to same dtype as data's
--> 165 encoding["_FillValue"] = data.dtype.type(fv)
166 fill_value = pop_to(encoding, attrs, "_FillValue", name=name)
167 if not pd.isnull(fill_value):
OverflowError: Python int too large to convert to C long
I'll put together an xarray issue. I think I have a decent idea of what the solution needs to be.
Cool, thanks @spencerkclark for digging into that!
OK, going ahead with the merge then...the failure is because of the now-obsolete python 3.5 environment. All others are green.
In #235, we switched from scipy to netcdf4 as the engine for our
to_netcdf
calls. And we included theformat='NETCDF3_64BIT'
kwarg. I can't recall why we did that, and neither the PR nor code seem to say anything about it.I bring this up because our tests failed with xarray 0.14.1 with that option included, but omitting it everything passes.
@spencerkclark do you happen to remember why we did this? And any other comments on the PR would be welcome. Note that the failures are from the now obsolete python 3.5 CI jobs; all the others pass.