intake / intake-esm

An intake plugin for parsing an Earth System Model (ESM) catalog and loading assets into xarray datasets.
https://intake-esm.readthedocs.io
Apache License 2.0
130 stars 42 forks source link

re.compile not interpreted correctly when passed to the _search.py search method #657

Closed wrongkindofdoctor closed 2 months ago

wrongkindofdoctor commented 4 months ago

Here's a quick checklist in what to include:

Description

I am trying to pass a python re.compile object for one of the column entries in an intake catalog search following the example in the code comments. However, the search method expects values to be iterables in the query dict, and throws an error when trying to resolve the re.compile object.

What I Did

   for case_name, case_d in case_dict.items():
        path_regex = re.compile(r'({})'.format(case_name)). # Search for the case_name group in the path entries
        freq = case_d.varlist.T.frequency
        for v in case_d.varlist.iter_vars():
              cat_subset = cat.search(activity_id=case_d.convention,
                                   standard_name=v.standard_name,
                                   frequency=freq,
                                   realm=v.realm,
                                   path=path_regex
                                   )

The path_regex object passed to catalog _search.search method:

re.compile('(CMIP_Synthetic_r1i1p1f1_gr1_19800101-19841231)')

path_regex has the following attributes:

Thus, values.pattern seems like it is what the search method should be using in the for value in values loop if values is an re.compile object Stack trace


File "/Users/j/micromamba/envs/_MDTF_base/lib/python3.11/site-packages/pydantic/deprecated/decorator.py", line 55, in wrapper_function
    return vd.call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/j/micromamba/envs/_MDTF_base/lib/python3.11/site-packages/pydantic/deprecated/decorator.py", line 150, in call
    return self.execute(m)
           ^^^^^^^^^^^^^^^
  File "/Users/j/micromamba/envs/_MDTF_base/lib/python3.11/site-packages/pydantic/deprecated/decorator.py", line 222, in execute
    return self.raw_function(**d, **var_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/j/micromamba/envs/_MDTF_base/lib/python3.11/site-packages/intake_esm/core.py", line 393, in search
    esmcat_results = self.esmcat.search(require_all_on=require_all_on, query=query)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/j/micromamba/envs/_MDTF_base/lib/python3.11/site-packages/intake_esm/cat.py", line 385, in search
    results = search(
              ^^^^^^^
  File "/Users/j/micromamba/envs/_MDTF_base/lib/python3.11/site-packages/intake_esm/_search.py", line 46, in search
    for value in values:
TypeError: 're.Pattern' object is not iterable

Version information: output of intake_esm.show_versions()

Paste the output of `intake_esm.show_versions()` here: ```python INSTALLED VERSIONS ------------------ cftime: 1.6.2 dask: 2023.9.1 fastprogress: 1.0.3 fsspec: 2024.2.0 gcsfs: None intake: 0.7.0 intake_esm: 2024.2.6 netCDF4: 1.6.4 pandas: 2.1.0 requests: 2.31.0 s3fs: None xarray: 2023.8.0 zarr: 2.16.1 ```
mgrover1 commented 4 months ago

@wrongkindofdoctor - can you try passing it as a list? Sorry for the delayed response here.

ex.

   for case_name, case_d in case_dict.items():
        path_regex = re.compile(r'({})'.format(case_name)). # Search for the case_name group in the path entries
        freq = case_d.varlist.T.frequency
        for v in case_d.varlist.iter_vars():
              cat_subset = cat.search(activity_id=case_d.convention,
                                   standard_name=v.standard_name,
                                   frequency=freq,
                                   realm=v.realm,
                                   path=[path_regex]
                                   )
wrongkindofdoctor commented 2 months ago

@mgrover1 sorry for the late response. I just got around to testing passing the re.compile object as a list to cat.search, and this resolves the issue. Thanks for your help!