kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.53k stars 877 forks source link

adlfs fsspec version incompatibility with kedro_viz #864

Closed ElCuboNegro closed 2 years ago

ElCuboNegro commented 2 years ago

Description

when I runned the magic %reload_kedro the following exception appeared


Install adlfs to access Azure Datalake Gen2 and Azure Blob Storage.
Failed to instantiate DataSet 'customer_master_table@pandas' of type `kedro.extras.datasets.pandas.parquet_dataset.ParquetDataSet`.
Traceback (most recent call last):
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/fsspec/registry.py", line 211, in get_filesystem_class
    register_implementation(protocol, _import_class(bit["class"]))
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/fsspec/registry.py", line 226, in _import_class
    mod = importlib.import_module(mod)
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/adlfs/__init__.py", line 1, in <module>
    from .spec import AzureDatalakeFileSystem
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/adlfs/spec.py", line 28, in <module>
    from fsspec.asyn import (
ImportError: cannot import name 'get_running_loop' from 'fsspec.asyn' (/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/fsspec/asyn.py)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/kedro/io/core.py", line 177, in from_config
    data_set = class_obj(**config)  # type: ignore
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/kedro/extras/datasets/pandas/parquet_dataset.py", line 132, in __init__
    self._fs = fsspec.filesystem(self._protocol, **_credentials, **_fs_args)
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/fsspec/registry.py", line 243, in filesystem
    cls = get_filesystem_class(protocol)
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/fsspec/registry.py", line 213, in get_filesystem_class
    raise ImportError(bit["err"]) from e
ImportError: Install adlfs to access Azure Datalake Gen2 and Azure Blob Storage

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/guests/juan.alban.adm/Aldebaran/.ipython/profile_default/startup/00-kedro-init.py", line 65, in reload_kedro
    catalog = context.catalog
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/kedro/framework/context/context.py", line 330, in catalog
    return self._get_catalog()
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/kedro/framework/context/context.py", line 380, in _get_catalog
    journal=journal,
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/pluggy/hooks.py", line 286, in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/pluggy/manager.py", line 93, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/pluggy/manager.py", line 87, in <lambda>
    firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/pluggy/callers.py", line 208, in _multicall
    return outcome.get_result()
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/pluggy/callers.py", line 80, in get_result
    raise ex[1].with_traceback(ex[2])
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/pluggy/callers.py", line 187, in _multicall
    res = hook_impl.function(*args)
  File "/home/guests/juan.alban.adm/Aldebaran/src/decameron_kronos/hooks.py", line 54, in register_catalog
    catalog, credentials, load_versions, save_version, journal
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/kedro/io/data_catalog.py", line 329, in from_config
    ds_name, ds_config, load_versions.get(ds_name), save_version
  File "/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/kedro/io/core.py", line 187, in from_config
    ) from err
kedro.io.core.DataSetError: 
Install adlfs to access Azure Datalake Gen2 and Azure Blob Storage.
Failed to instantiate DataSet 'customer_master_table@pandas' of type `kedro.extras.datasets.pandas.parquet_dataset.ParquetDataSet`.```

When I installed adlfs, it installed `fsspec-2021.7.0` and the following exception appeared:

```ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
decameron-kronos 0.1 requires fsspec==0.8.7, but you have fsspec 2021.7.0 which is incompatible.
kedro 0.17.3 requires fsspec<0.9,>=0.5.1, but you have fsspec 2021.7.0 which is incompatible.```

that error raises a ContextualVersionConflict 

```---------------------------------------------------------------------------
ContextualVersionConflict                 Traceback (most recent call last)
~/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/kedro/framework/cli/utils.py in load_entry_points(name)
    379         try:
--> 380             entry_point_commands.append(entry_point.load())
    381         except Exception as exc:

~/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/pkg_resources/__init__.py in load(self, require, *args, **kwargs)
   2448         if require:
-> 2449             self.require(*args, **kwargs)
   2450         return self.resolve()

~/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/pkg_resources/__init__.py in require(self, env, installer)
   2471         reqs = self.dist.requires(self.extras)
-> 2472         items = working_set.resolve(reqs, env, installer, extras=self.extras)
   2473         list(map(working_set.add, items))

~/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/pkg_resources/__init__.py in resolve(self, requirements, env, installer, replace_conflicting, extras)
    776                 dependent_req = required_by[req]
--> 777                 raise VersionConflict(dist, req).with_context(dependent_req)
    778 

ContextualVersionConflict: (fsspec 2021.7.0 (/home/guests/juan.alban.adm/anaconda3/envs/Aldebaran/lib/python3.7/site-packages), Requirement.parse('fsspec<0.9,>=0.5.1'), {'kedro'})

The above exception was the direct cause of the following exception:

KedroCliError                             Traceback (most recent call last)
<ipython-input-1-abfcd54c960b> in <module>
----> 1 get_ipython().run_line_magic('reload_kedro', '')

~/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
   2305                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2306             with self.builtin_trap:
-> 2307                 result = fn(*args, **kwargs)
   2308             return result
   2309 

~/Aldebaran/.ipython/profile_default/startup/00-kedro-init.py in reload_kedro(path, line, env, extra_params)
     76             "Kedro's ipython session startup script failed:\n%s", str(err)
     77         )
---> 78         raise err
     79 
     80 

~/Aldebaran/.ipython/profile_default/startup/00-kedro-init.py in reload_kedro(path, line, env, extra_params)
     68         logging.info("Defined global variable `context`, `session` and `catalog`")
     69 
---> 70         for line_magic in collect_line_magic():
     71             register_line_magic(needs_local_scope(line_magic))
     72             logging.info("Registered line magic `%s`", line_magic.__name__)

~/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/kedro/framework/cli/jupyter.py in collect_line_magic()
     73     """Interface function for collecting line magic functions from plugin entry points.
     74     """
---> 75     return load_entry_points("line_magic")
     76 
     77 

~/anaconda3/envs/Aldebaran/lib/python3.7/site-packages/kedro/framework/cli/utils.py in load_entry_points(name)
    380             entry_point_commands.append(entry_point.load())
    381         except Exception as exc:
--> 382             raise KedroCliError(f"Loading {name} commands from {entry_point}") from exc
    383     return entry_point_commands
    384 

KedroCliError: Loading line_magic commands from line_magic = kedro_viz.launchers.jupyter:run_viz```

## Context
I was trying to access information stored in the Azure Blob Storage

## Steps to Reproduce
1. [First Step]
2. [Second Step]
3. [And so on...]

## Expected Result
Tell us what should happen.

## Actual Result
Tell us what happens instead.

-- If you received an error, place it here.

-- Separate them if you have more than one.



## Your Environment
Include as many relevant details about the environment in which you experienced the bug:

* Kedro version used (`pip show kedro` or `kedro -V`): Version: 0.17.3
* Python version used (`python -V`): Python 3.7.10
* Operating system and version: CENTOS 7
datajoely commented 2 years ago

Hi @ElCuboNegro did you use pip install or kedro install / kedro build-reqs?

ElCuboNegro commented 2 years ago

image

datajoely commented 2 years ago

Ah okay - so this isn't the recommended way of installing things. The right way is to edit your requirements.in file and follow this guide.

I would also say - if you're ever trying to do things quickly this article by Jake VanderPlas is a useful reading.

ElCuboNegro commented 2 years ago

wow! thanks!! btw, solved adding adlfs==0.7.0 and building with kedro build-reqs and kedro install ;)