intake / intake-xarray

Intake plugin for xarray
https://intake-xarray.readthedocs.io/
BSD 2-Clause "Simplified" License
76 stars 36 forks source link

FutureWarning from intake after using intake_xarray.netcdf.NetCDFSource() #54

Closed christine-e-smit closed 5 years ago

christine-e-smit commented 5 years ago

When I run:

xnc = intake_xarray.netcdf.NetCDFSource(
    '{}/nc/*.nc'.format(base_dir,var),concat_dim='time')
dnc = xnc.to_dask()

I get the warning:

FutureWarning: In xarray version 0.14 the default behaviour of `open_mfdataset`
will change. To retain the existing behavior, pass
combine='nested'. To use future default behavior, pass
combine='by_coords'. See
http://xarray.pydata.org/en/stable/combining.html#combining-multi

  self._ds = _open_dataset(url, chunks=self.chunks, **kwargs)
/Users/csmit/Software/python3/envs/xarray/lib/python3.7/site-packages/xarray/backends/api.py:934: FutureWarning: Also `open_mfdataset` will no longer accept a `concat_dim` argument.
To get equivalent behaviour from now on please use the new
`combine_nested` function instead (or the `combine='nested'` option to
`open_mfdataset`).The datasets supplied have global dimension coordinates. You may want
to use the new `combine_by_coords` function (or the
`combine='by_coords'` option to `open_mfdataset`) to order the datasets
before concatenation. Alternatively, to continue concatenating based
on the order the datasets are supplied in future, please use the new
`combine_nested` function (or the `combine='nested'` option to
open_mfdataset).
  from_openmfds=True,

Is there some way I can get around this warning?

martindurant commented 5 years ago

Yep, xarray 0.13 was released yesterday. To be rid of the warning, ideally the code should be updated, and have its requirement set to the latest xarray. Would you be willing to implement this? Alternatively, it is possible using python warnings to silence specific warnings that may be annoying you.

christine-e-smit commented 5 years ago

In theory, yes, I think I could make the change. However, I'm having trouble running your unit tests on the master branch. I'm getting two errors:

$ nosetests test
EE...............................S.....................
======================================================================
ERROR: test.support.testresult.get_test_runner_class
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/csmit/Software/python3/envs/intake/lib/python3.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
TypeError: get_test_runner_class() missing 1 required positional argument: 'verbosity'

======================================================================
ERROR: test.support.testresult.get_test_runner
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/csmit/Software/python3/envs/intake/lib/python3.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
TypeError: get_test_runner() missing 2 required positional arguments: 'stream' and 'verbosity'

----------------------------------------------------------------------
Ran 55 tests in 5.553s

FAILED (SKIP=1, errors=2)
martindurant commented 5 years ago

I don't know what that means, but you should use pytest instead of nosetests

christine-e-smit commented 5 years ago

Okay. I can use pytest directly. I tend to use nose because we have a mix of pytest and unittest-based unit tests. Unfortunately, I'm still struggling to get the tests to pass. Help me out here.

I believe I have all the dependencies installed at versions that make setup.py happy:

In [1]: import intake                                                                                                                        

In [2]: intake.__version__                                                                                                                   
Out[2]: '0.5.3'

In [3]: import xarray                                                                                                                        

In [4]: xarray.__version__                                                                                                                   
Out[4]: '0.13.0'

In [5]: import dask                                                                                                                          

In [6]: dask.__version__                                                                                                                     
Out[6]: '2.4.0'

In [7]: import zarr                                                                                                                          

In [8]: zarr.__version__                                                                                                                     
Out[8]: '2.3.2'

In [9]: import netCDF4                                                                                                                       

In [10]: netCDF4.__version__                                                                                                                 
Out[10]: '1.5.1.2'

But I'm still seeing errors. Here is my output:

$ pytest intake_xarray/tests/
============================================================================================== test session starts ===============================================================================================
platform darwin -- Python 3.7.3, pytest-5.1.2, py-1.8.0, pluggy-0.13.0
rootdir: /Users/csmit/Code/intake-xarray
collected 48 items                                                                                                                                                                                               

intake_xarray/tests/test_catalog.py .F.                                                                                                                                                                    [  6%]
intake_xarray/tests/test_discovery.py F                                                                                                                                                                    [  8%]
intake_xarray/tests/test_image.py ............sss                                                                                                                                                          [ 39%]
intake_xarray/tests/test_intake_xarray.py ...........ssssssssssssssss                                                                                                                                      [ 95%]
intake_xarray/tests/test_remote.py Fs                                                                                                                                                                      [100%]

==================================================================================================== FAILURES ====================================================================================================
__________________________________________________________________________________________________ test_persist __________________________________________________________________________________________________

catalog1 = <Intake catalog: data>

    def test_persist(catalog1):
        from intake_xarray import ZarrSource
        source = catalog1['blank']
>       s2 = source.persist()

intake_xarray/tests/test_catalog.py:27: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <intake_xarray.xzarr.ZarrSource object at 0x115cf8c18>, ttl = None, kwargs = {}
container_map = {'catalog': <class 'intake.catalog.base.RemoteCatalog'>, 'dataframe': <class 'intake.container.dataframe.RemoteDataFra...'intake.container.ndarray.RemoteArray'>, 'python': <class 'intake.container.semistructured.RemoteSequenceSource'>, ...}
PersistStore = <class 'intake.container.persist.PersistStore'>, time = <module 'time' (built-in)>

    def persist(self, ttl=None, **kwargs):
        """Save data from this source to local persistent storage"""
        from ..container import container_map
        from ..container.persist import PersistStore
        import time
        if 'original_tok' in self.metadata:
>           raise ValueError('Cannot persist a source taken from the persist '
                             'store')
E           ValueError: Cannot persist a source taken from the persist store

../../Software/python3/envs/intake/lib/python3.7/site-packages/intake/source/base.py:267: ValueError
_________________________________________________________________________________________________ test_discovery _________________________________________________________________________________________________

    def test_discovery():
        with pytest.warns(None) as record:
            registry = intake.autodiscover()
        # For awhile we expect a PendingDeprecationWarning due to
        # do_pacakge_scan=True. But we should *not* get a FutureWarning.
        for record in record.list:
>           assert not isinstance(record.message, FutureWarning)
E           assert not True
E            +  where True = isinstance(FutureWarning("The drivers ['xarray_image', 'netcdf', 'opendap', 'rasterio', 'remote-xarray', 'zarr'] do not specify e...ere only discovered via a package scan. This may break in a future release of intake. The packages should be updated."), FutureWarning)
E            +    where FutureWarning("The drivers ['xarray_image', 'netcdf', 'opendap', 'rasterio', 'remote-xarray', 'zarr'] do not specify e...ere only discovered via a package scan. This may break in a future release of intake. The packages should be updated.") = <warnings.WarningMessage object at 0x115cebb00>.message

intake_xarray/tests/test_discovery.py:11: AssertionError
_______________________________________________________________________________________________ test_remote_netcdf _______________________________________________________________________________________________

intake_server = 'intake://localhost:8425'

    def test_remote_netcdf(intake_server):
        cat_local = intake.open_catalog(cat_file)
        cat = intake.open_catalog(intake_server)
        assert 'xarray_source' in cat
>       source = cat.xarray_source()

intake_xarray/tests/test_remote.py:39: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../Software/python3/envs/intake/lib/python3.7/site-packages/intake/catalog/entry.py:78: in __call__
    s = self.get(**kwargs)
../../Software/python3/envs/intake/lib/python3.7/site-packages/intake/catalog/remote.py:83: in get
    getshell=self.getshell)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

url = 'http://localhost:8425/', entry = 'xarray_source', container = None, user_parameters = {}, description = 'example xarray source plugin', http_args = {'headers': {}}, page_size = None
auth = <intake.auth.base.BaseClientAuth object at 0x116ad8f60>, getenv = True, getshell = True

    def open_remote(url, entry, container, user_parameters, description, http_args,
                    page_size=None, auth=None, getenv=None, getshell=None):
        """Create either local direct data source or remote streamed source"""
        from intake.container import container_map
        import msgpack
        import requests
        from requests.compat import urljoin

        if url.startswith('intake://'):
            url = url[len('intake://'):]
        payload = dict(action='open',
                       name=entry,
                       parameters=user_parameters,
                       available_plugins=list(plugin_registry.keys()))
        req = requests.post(urljoin(url, '/v1/source'),
                            data=msgpack.packb(payload, **pack_kwargs),
                            **http_args)
        if req.ok:
            response = msgpack.unpackb(req.content, **unpack_kwargs)

            if 'plugin' in response:
                pl = response['plugin']
                pl = [pl] if isinstance(pl, str) else pl
                # Direct access
                for p in pl:
                    if p in plugin_registry:
                        source = plugin_registry[p](**response['args'])
                        proxy = False
                        break
                else:
                    proxy = True
            else:
                proxy = True
            if proxy:
                response.pop('container')
                response.update({'name': entry, 'parameters': user_parameters})
                if container == 'catalog':
                    response.update({'auth': auth,
                                     'getenv': getenv,
                                     'getshell': getshell,
                                     'page_size': page_size
                                     # TODO ttl?
                                     # TODO storage_options?
                                     })
                source = container_map[container](url, http_args, **response)
            source.description = description
            return source

        else:
>           raise Exception('Server error: %d, %s' % (req.status_code, req.reason))
E           Exception: Server error: 500, Internal Server Error

../../Software/python3/envs/intake/lib/python3.7/site-packages/intake/catalog/remote.py:135: Exception
--------------------------------------------------------------------------------------------- Captured stderr setup ----------------------------------------------------------------------------------------------
2019-09-20 10:07:54,309 - intake - INFO - __main__.py:main:L48 - Creating catalog from:
2019-09-20 10:07:54,309 - intake - INFO - __main__.py:main:L50 -   - /Users/csmit/Code/intake-xarray/intake_xarray/tests/data/catalog.yaml
2019-09-20 10:07:54,322 - intake - INFO - __main__.py:main:L55 - catalog_args: /Users/csmit/Code/intake-xarray/intake_xarray/tests/data/catalog.yaml
2019-09-20 10:07:54,322 - intake - INFO - __main__.py:main:L63 - Listening on port 8425
WARNING:tornado.access:404 GET / (::1) 0.70ms
---------------------------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------------------------
ERROR:tornado.application:Uncaught exception POST /v1/source (::1)
HTTPServerRequest(protocol='http', host='localhost:8425', method='POST', uri='/v1/source', version='HTTP/1.1', remote_ip='::1')
Traceback (most recent call last):
  File "/Users/csmit/Software/python3/envs/intake/lib/python3.7/site-packages/tornado/web.py", line 1699, in _execute
    result = await result
  File "/Users/csmit/Software/python3/envs/intake/lib/python3.7/site-packages/tornado/gen.py", line 191, in wrapper
    result = func(*args, **kwargs)
  File "/Users/csmit/Software/python3/envs/intake/lib/python3.7/site-packages/intake/cli/server/server.py", line 303, in post
    source = entry.get(**user_parameters)
  File "/Users/csmit/Software/python3/envs/intake/lib/python3.7/site-packages/intake/catalog/local.py", line 282, in get
    plugin, open_args = self._create_open_args(user_parameters)
  File "/Users/csmit/Software/python3/envs/intake/lib/python3.7/site-packages/intake/catalog/local.py", line 263, in _create_open_args
    % self._driver)
ValueError: No plugins loaded for this entry: netcdf
A listing of installable plugins can be found at https://intake.readthedocs.io/en/latest/plugin-directory.html .
ERROR:tornado.access:500 POST /v1/source (::1) 4.39ms
================================================================================================ warnings summary ================================================================================================
intake_xarray/tests/test_intake_xarray.py::test_read_list_of_netcdf_files
intake_xarray/tests/test_intake_xarray.py::test_read_glob_pattern_of_netcdf_files
  /Users/csmit/Code/intake-xarray/intake_xarray/netcdf.py:58: FutureWarning: In xarray version 0.14 the default behaviour of `open_mfdataset`
  will change. To retain the existing behavior, pass
  combine='nested'. To use future default behavior, pass
  combine='by_coords'. See
  http://xarray.pydata.org/en/stable/combining.html#combining-multi

    self._ds = _open_dataset(url, chunks=self.chunks, **kwargs)

intake_xarray/tests/test_intake_xarray.py::test_read_list_of_netcdf_files
intake_xarray/tests/test_intake_xarray.py::test_read_glob_pattern_of_netcdf_files
  /Users/csmit/Software/python3/envs/intake/lib/python3.7/site-packages/xarray/backends/api.py:934: FutureWarning: Also `open_mfdataset` will no longer accept a `concat_dim` argument.
  To get equivalent behaviour from now on please use the new
  `combine_nested` function instead (or the `combine='nested'` option to
  `open_mfdataset`).The datasets supplied do not have global dimension coordinates. In
  future, to continue concatenating without supplying dimension
  coordinates, please use the new `combine_nested` function (or the
  `combine='nested'` option to open_mfdataset.
    from_openmfds=True,

-- Docs: https://docs.pytest.org/en/latest/warnings.html
============================================================================== 3 failed, 25 passed, 20 skipped, 4 warnings in 1.07s ==============================================================================
christine-e-smit commented 5 years ago

If there's a docker container somewhere that I can run the tests from, I'd be happy to go that route to solve dependency and version issues.

martindurant commented 5 years ago

The first error may be because of a lingering .intake directory (which is poor design on our part) For the second, the netcdf driver didn't load?: perhaps a problem with the environment?

christine-e-smit commented 5 years ago

I thought that maybe my issue was that I was running on a Mac, so I tried running the unit tests in a Docker container, but I still ran into failures.

I can run the main unit test without errors, intake_xarray/tests/test_intake_xarray.py. The pull request I've put in fixes the warning from this unit test.

Just for reference, here's the Dockerfile I used to try and run the unit tests:

FROM continuumio/anaconda3

ENV PATH=/opt/conda/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

RUN conda install -c conda-forge -y \
    rasterio

RUN conda install -y \
    pylint \
    intake=0.5.3 \
    xarray=0.13.0 \
    zarr \
    dask=2.4 \
    netcdf4
jsignell commented 5 years ago

closed by #55