petl-developers / petl

Python Extract Transform and Load Tables of Data
MIT License
1.22k stars 190 forks source link

Petl is incompatible with custom fsspec file systems #646

Closed mlemainque closed 10 months ago

mlemainque commented 11 months ago

Summary

Importing petl fails whenever a third-party filesystem has been previously declared into fsspec.registry.

Steps to reproduce

...using for example the library datasets which is known to (rightfully) declare some custom file systems (here):

$ python -V
Python 3.10.12
$ pip install petl datasets
Successfully installed datasets-2.14.3 fsspec-2023.6.0 petl-1.7.12 ...
import datasets
import petl

This script will raise the following error:

Traceback (most recent call last):
  File ".../site-packages/petl/io/remotes.py", line 134, in _try_register_filesystems
    _register_filesystems()
  File ".../site-packages/petl/io/remotes.py", line 103, in _register_filesystems
    _register_filesystems_from(registry, only_available)
  File ".../site-packages/petl/io/remotes.py", line 109, in _register_filesystems_from
    missing_deps = "err" in spec
TypeError: argument of type '_Cached' is not iterable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../site-packages/petl/__init__.py", line 7, in <module>
    from petl import util
  File ".../site-packages/petl/util/__init__.py", line 13, in <module>
    from petl.util.vis import look, lookall, lookstr, lookallstr, see
  File ".../site-packages/petl/util/vis.py", line 12, in <module>
    from petl.io.sources import MemorySource
  File ".../site-packages/petl/io/__init__.py", line 43, in <module>
    from petl.io.remotes import RemoteSource
  File ".../site-packages/petl/io/remotes.py", line 140, in <module>
    _try_register_filesystems()
  File ".../site-packages/petl/io/remotes.py", line 136, in _try_register_filesystems
    raise ImportError("# ERROR: failed to register fsspec filesystems", ex)
ImportError: ('# ERROR: failed to register fsspec filesystems', TypeError("argument of type '_Cached' is not iterable"))

Root cause analysis

I think the code to blame is here in petl as it simply cannot deal with a non-empty fsspec.registry which is supposed to contain classes and not dicts. I guess one possible fix would be to delete the L103 or to skip the L109 when the spec is not a dict but a class.

Wdyt? I can submit a PR if you wish

Thanks for your help 🙏🏻

juarezr commented 11 months ago

I think the code to blame is here in petl as it simply cannot deal with a non-empty fsspec.registry which is supposed to contain classes and not dicts.

It's 2018 code. Probably fsspec evolved since that.

I guess one possible fix would be to delete the L103 or to skip the L109 when the spec is not a dict but a class.

A check for dict may be enough.

Wdyt? I can submit a PR if you wish

Yes, of course.

juarezr commented 10 months ago

Fixed in #647.