ncar-xdev / ecgtools

ESM Catalog Generation tools
https://ecgtools.readthedocs.io
Apache License 2.0
9 stars 11 forks source link

[Bug]: RootDirectory.walk fails with fsspec>=2023.6.0 #160

Closed dougiesquire closed 1 year ago

dougiesquire commented 1 year ago

What happened?

A change was introduced in fsspec=2023.6.0 to the way filesystems are walked that breaks the walk method on the ecgtools.builder.RootDirectory class. See below for a traceback. This is the reason why the ecgtools CI tests are currently failing.

This is the fsspec change that caused the issue: https://github.com/fsspec/filesystem_spec/pull/1263

What did you expect to happen?

RootDirectory can be walked without error

Minimal Complete Verifiable Example

# See also the failing testing in the CI workflow
from ecgtools import RootDirectory, glob_to_regex

path = "~"
depth = 1
include_patterns=["*"]
exclude_patterns=[]

include_regex, exclude_regex = glob_to_regex(
    include_patterns=include_patterns, 
    exclude_patterns=exclude_patterns
)

directory = RootDirectory(
    path=path,
    depth=depth,
    include_regex=include_regex,
    exclude_regex=exclude_regex
)
directory.walk()

Relevant log output

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[13], line 20
      9 include_regex, exclude_regex = glob_to_regex(
     10     include_patterns=include_patterns, 
     11     exclude_patterns=exclude_patterns
     12 )
     14 directory = RootDirectory(
     15     path=path,
     16     depth=depth,
     17     include_regex=include_regex,
     18     exclude_regex=exclude_regex
     19 )
---> 20 directory.walk()

File ~/miniconda3/envs/test/lib/python3.11/site-packages/ecgtools/builder.py:59, in RootDirectory.walk(self)
     57 def walk(self):
     58     all_assets = []
---> 59     for root, dirs, files in self.mapper.fs.walk(self.raw_path, maxdepth=self.depth + 1):
     60         # exclude dirs
     61         dirs[:] = [os.path.join(root, directory) for directory in dirs]
     62         dirs[:] = [
     63             directory for directory in dirs if not re.match(self.exclude_regex, directory)
     64         ]

File ~/miniconda3/envs/test/lib/python3.11/site-packages/fsspec/spec.py:452, in AbstractFileSystem.walk(self, path, maxdepth, topdown, **kwargs)
    448         return
    450 for d in dirs:
    451     yield from self.walk(
--> 452         full_dirs[d],
    453         maxdepth=maxdepth,
    454         detail=detail,
    455         topdown=topdown,
    456         **kwargs,
    457     )
    459 if not topdown:
    460     # Yield after recursion if walking bottom up
    461     yield path, dirs, files

KeyError: <the first path encountered by walk>

Anything else we need to know?

I can have a stab at a PR if wanted