pangeo-data / pangeo-datastore

Pangeo Cloud Datastore
https://catalog.pangeo.io
48 stars 16 forks source link

Add floats dataset #106

Closed cspencerjones closed 4 years ago

cspencerjones commented 4 years ago

Add a dataset that shows float output from the channel run (and the corresponding 3-day-averaged model variables).

cspencerjones commented 4 years ago

@charlesbluca is this not working because of #102 (it seems as though there is some sort of credentials problem)? Or is there something obvious I have missed? Thanks!

rabernat commented 4 years ago

I just tested these new entries--it doesn't seem to work:

from intake import open_catalog
url = ""https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/1449d8fc28b786be3f940740a5fbfd43bbd7456c/intake-catalogs/ocean/channel.yaml"
cat = open_catalog(url)
list(cat)

That's fine, but I can't load:

cat.channel_ridge_05km_float_run.to_dask()

raises

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()

/srv/conda/envs/notebook/lib/python3.7/site-packages/dateutil/parser/_parser.py in parse(timestr, parserinfo, **kwargs)
   1355     else:
-> 1356         return DEFAULTPARSER.parse(timestr, **kwargs)
   1357 

/srv/conda/envs/notebook/lib/python3.7/site-packages/dateutil/parser/_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    647         if res is None:
--> 648             raise ValueError("Unknown string format:", timestr)
    649 

ValueError: ('Unknown string format:', 'beginning of run')

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in _decode_datetime_with_pandas(flat_num_dates, units, calendar)
    110     try:
--> 111         ref_date = pd.Timestamp(ref_date)
    112     except ValueError:

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject()

ValueError: could not convert string to Timestamp

During handling of the above exception, another exception occurred:

OutOfBoundsDatetime                       Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in decode_cf_datetime(num_dates, units, calendar, use_cftime)
    156         try:
--> 157             dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar)
    158         except (OutOfBoundsDatetime, OverflowError):

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in _decode_datetime_with_pandas(flat_num_dates, units, calendar)
    114         # strings, in which case we fall back to using cftime
--> 115         raise OutOfBoundsDatetime
    116 

OutOfBoundsDatetime: 

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in _decode_cf_datetime_dtype(data, units, calendar, use_cftime)
     76     try:
---> 77         result = decode_cf_datetime(example_value, units, calendar, use_cftime)
     78     except Exception:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in decode_cf_datetime(num_dates, units, calendar, use_cftime)
    159             dates = _decode_datetime_with_cftime(
--> 160                 flat_num_dates.astype(np.float), units, calendar
    161             )

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in _decode_datetime_with_cftime(num_dates, units, calendar)
     96     return np.asarray(
---> 97         cftime.num2date(num_dates, units, calendar, only_use_cftime_datetimes=True)
     98     )

cftime/_cftime.pyx in cftime._cftime.num2date()

cftime/_cftime.pyx in cftime._cftime._dateparse()

cftime/_cftime.pyx in cftime._cftime._parse_date()

ValueError: Unable to parse date string 'beginning of run'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-5-f0a942240b6c> in <module>
----> 1 cat.channel_ridge_05km_float_run.to_dask()

/srv/conda/envs/notebook/lib/python3.7/site-packages/intake_xarray/base.py in to_dask(self)
     67     def to_dask(self):
     68         """Return xarray object where variables are dask arrays"""
---> 69         return self.read_chunked()
     70 
     71     def close(self):

/srv/conda/envs/notebook/lib/python3.7/site-packages/intake_xarray/base.py in read_chunked(self)
     42     def read_chunked(self):
     43         """Return xarray object (which will have chunks)"""
---> 44         self._load_metadata()
     45         return self._ds
     46 

/srv/conda/envs/notebook/lib/python3.7/site-packages/intake/source/base.py in _load_metadata(self)
    115         """load metadata only if needed"""
    116         if self._schema is None:
--> 117             self._schema = self._get_schema()
    118             self.datashape = self._schema.datashape
    119             self.dtype = self._schema.dtype

/srv/conda/envs/notebook/lib/python3.7/site-packages/intake_xarray/base.py in _get_schema(self)
     16 
     17         if self._ds is None:
---> 18             self._open_dataset()
     19 
     20             metadata = {

/srv/conda/envs/notebook/lib/python3.7/site-packages/intake_xarray/xzarr.py in _open_dataset(self)
     29 
     30         self._mapper = get_mapper(self.urlpath, **self.storage_options)
---> 31         self._ds = xr.open_zarr(self._mapper, **self.kwargs)
     32 
     33     def close(self):

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/zarr.py in open_zarr(store, group, synchronizer, chunks, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, consolidated, overwrite_encoded_chunks, **kwargs)
    597         consolidated=consolidated,
    598     )
--> 599     ds = maybe_decode_store(zarr_store)
    600 
    601     # auto chunking needs to be here and not in ZarrStore because variable

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/zarr.py in maybe_decode_store(store, lock)
    580             concat_characters=concat_characters,
    581             decode_coords=decode_coords,
--> 582             drop_variables=drop_variables,
    583         )
    584 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime)
    583         decode_coords,
    584         drop_variables=drop_variables,
--> 585         use_cftime=use_cftime,
    586     )
    587     ds = Dataset(vars, attrs=attrs)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/conventions.py in decode_cf_variables(variables, attributes, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime)
    492             decode_times=decode_times,
    493             stack_char_dim=stack_char_dim,
--> 494             use_cftime=use_cftime,
    495         )
    496         if decode_coords:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/conventions.py in decode_cf_variable(name, var, concat_characters, mask_and_scale, decode_times, decode_endianness, stack_char_dim, use_cftime)
    334             times.CFDatetimeCoder(use_cftime=use_cftime),
    335         ]:
--> 336             var = coder.decode(var, name=name)
    337 
    338     dimensions, data, attributes, encoding = variables.unpack_for_decoding(var)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in decode(self, variable, name)
    424             units = pop_to(attrs, encoding, "units")
    425             calendar = pop_to(attrs, encoding, "calendar")
--> 426             dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
    427             transform = partial(
    428                 decode_cf_datetime,

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in _decode_cf_datetime_dtype(data, units, calendar, use_cftime)
     84             "opening your dataset with decode_times=False." % (units, calendar_msg)
     85         )
---> 86         raise ValueError(msg)
     87     else:
     88         dtype = getattr(result, "dtype", np.dtype("object"))

ValueError: unable to decode time units 'days since beginning of run' with the default calendar. Try opening your dataset with decode_times=False.

and

cat.channel_ridge_05km_floats.to_dask()
--------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()

/srv/conda/envs/notebook/lib/python3.7/site-packages/dateutil/parser/_parser.py in parse(timestr, parserinfo, **kwargs)
   1355     else:
-> 1356         return DEFAULTPARSER.parse(timestr, **kwargs)
   1357 

/srv/conda/envs/notebook/lib/python3.7/site-packages/dateutil/parser/_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    647         if res is None:
--> 648             raise ValueError("Unknown string format:", timestr)
    649 

ValueError: ('Unknown string format:', 'beginning of run')

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in _decode_datetime_with_pandas(flat_num_dates, units, calendar)
    110     try:
--> 111         ref_date = pd.Timestamp(ref_date)
    112     except ValueError:

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject()

ValueError: could not convert string to Timestamp

During handling of the above exception, another exception occurred:

OutOfBoundsDatetime                       Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in decode_cf_datetime(num_dates, units, calendar, use_cftime)
    156         try:
--> 157             dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar)
    158         except (OutOfBoundsDatetime, OverflowError):

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in _decode_datetime_with_pandas(flat_num_dates, units, calendar)
    114         # strings, in which case we fall back to using cftime
--> 115         raise OutOfBoundsDatetime
    116 

OutOfBoundsDatetime: 

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in _decode_cf_datetime_dtype(data, units, calendar, use_cftime)
     76     try:
---> 77         result = decode_cf_datetime(example_value, units, calendar, use_cftime)
     78     except Exception:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in decode_cf_datetime(num_dates, units, calendar, use_cftime)
    159             dates = _decode_datetime_with_cftime(
--> 160                 flat_num_dates.astype(np.float), units, calendar
    161             )

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in _decode_datetime_with_cftime(num_dates, units, calendar)
     96     return np.asarray(
---> 97         cftime.num2date(num_dates, units, calendar, only_use_cftime_datetimes=True)
     98     )

cftime/_cftime.pyx in cftime._cftime.num2date()

cftime/_cftime.pyx in cftime._cftime._dateparse()

cftime/_cftime.pyx in cftime._cftime._parse_date()

ValueError: Unable to parse date string 'beginning of run'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-6-b3ce750782d4> in <module>
----> 1 cat.channel_ridge_05km_floats.to_dask()

/srv/conda/envs/notebook/lib/python3.7/site-packages/intake_xarray/base.py in to_dask(self)
     67     def to_dask(self):
     68         """Return xarray object where variables are dask arrays"""
---> 69         return self.read_chunked()
     70 
     71     def close(self):

/srv/conda/envs/notebook/lib/python3.7/site-packages/intake_xarray/base.py in read_chunked(self)
     42     def read_chunked(self):
     43         """Return xarray object (which will have chunks)"""
---> 44         self._load_metadata()
     45         return self._ds
     46 

/srv/conda/envs/notebook/lib/python3.7/site-packages/intake/source/base.py in _load_metadata(self)
    115         """load metadata only if needed"""
    116         if self._schema is None:
--> 117             self._schema = self._get_schema()
    118             self.datashape = self._schema.datashape
    119             self.dtype = self._schema.dtype

/srv/conda/envs/notebook/lib/python3.7/site-packages/intake_xarray/base.py in _get_schema(self)
     16 
     17         if self._ds is None:
---> 18             self._open_dataset()
     19 
     20             metadata = {

/srv/conda/envs/notebook/lib/python3.7/site-packages/intake_xarray/xzarr.py in _open_dataset(self)
     29 
     30         self._mapper = get_mapper(self.urlpath, **self.storage_options)
---> 31         self._ds = xr.open_zarr(self._mapper, **self.kwargs)
     32 
     33     def close(self):

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/zarr.py in open_zarr(store, group, synchronizer, chunks, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, consolidated, overwrite_encoded_chunks, **kwargs)
    597         consolidated=consolidated,
    598     )
--> 599     ds = maybe_decode_store(zarr_store)
    600 
    601     # auto chunking needs to be here and not in ZarrStore because variable

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/zarr.py in maybe_decode_store(store, lock)
    580             concat_characters=concat_characters,
    581             decode_coords=decode_coords,
--> 582             drop_variables=drop_variables,
    583         )
    584 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime)
    583         decode_coords,
    584         drop_variables=drop_variables,
--> 585         use_cftime=use_cftime,
    586     )
    587     ds = Dataset(vars, attrs=attrs)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/conventions.py in decode_cf_variables(variables, attributes, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime)
    492             decode_times=decode_times,
    493             stack_char_dim=stack_char_dim,
--> 494             use_cftime=use_cftime,
    495         )
    496         if decode_coords:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/conventions.py in decode_cf_variable(name, var, concat_characters, mask_and_scale, decode_times, decode_endianness, stack_char_dim, use_cftime)
    334             times.CFDatetimeCoder(use_cftime=use_cftime),
    335         ]:
--> 336             var = coder.decode(var, name=name)
    337 
    338     dimensions, data, attributes, encoding = variables.unpack_for_decoding(var)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in decode(self, variable, name)
    424             units = pop_to(attrs, encoding, "units")
    425             calendar = pop_to(attrs, encoding, "calendar")
--> 426             dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
    427             transform = partial(
    428                 decode_cf_datetime,

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/coding/times.py in _decode_cf_datetime_dtype(data, units, calendar, use_cftime)
     84             "opening your dataset with decode_times=False." % (units, calendar_msg)
     85         )
---> 86         raise ValueError(msg)
     87     else:
     88         dtype = getattr(result, "dtype", np.dtype("object"))

ValueError: unable to decode time units 'days since beginning of run' with the default calendar. Try opening your dataset with decode_times=False.
​
rabernat commented 4 years ago

You can open the dataset without time decoding as follows:

cat.channel_ridge_05km_float_run(decode_times=False).to_dask()

Examining the time variable, the units are days since beginning of run and there is no calendar attribute. This is not a valid CF time.

We have two choices. We simply don't use any encoding on time and put the units as seconds. Or we do:

{'units': 'days since 0000-01-01 00:00:00',
 'calendar': '360_day'}

This might have to be tweaked a bit. You can experiment by doing something like

ds = cat.channel_ridge_05km_float_run(decode_times=False).to_dask()
ds.time.attrs.update({'units': 'days since 0000-01-01 00:00:00',
                      'calendar': '360_day'})

import xarray as xr
xr.decode_cf(ds)
cspencerjones commented 4 years ago

This should be fixed. Is it possible to rerun the checks?

charlesbluca commented 4 years ago

We are still reworking the CI for the repo, so the Travis checks would likely fail anyways - however, I can verify that the datasets could be opened without having to specify decode_times=False, so this should be fine to merge.