xorbitsai / xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.
https://xorbits.readthedocs.io
Apache License 2.0
1.13k stars 69 forks source link

BUG:pd.read_parquet from s3 path using '*' run into 'IndexError: string index out of range' #607

Closed smartguo closed 1 year ago

smartguo commented 1 year ago

Describe the bug

read parquet from s3 using '*' run into 'IndexError: string index out of range'

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version: 3.9.12
  2. The version of Xorbits you use: 0.4.2
  3. Versions of crucial packages, such as numpy, scipy and pandas: pandas==1.4.2
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Code

import os
import xorbits.pandas as pd

option = {
    "key": "****",
    "secret": "****",
    "endpoint_url": "https://cos.ap-beijing.myqcloud.com",
}

df = pd.read_parquet(
    "s3://<bucket>/<path>/pt=20230403/*parquet",
    storage_options=option,
)
print(df.head())

Error message:

Traceback (most recent call last):
  File "/Users/***/workspace/xorbits/test_s3.py", line 20, in <module>
    df = pd.read_parquet(
  File "/Users/***/lib/anaconda3/lib/python3.9/site-packages/xorbits/core/adapter.py", line 472, in wrapped
    return from_mars(c(*to_mars(args), **to_mars(kwargs)))
  File "/Users/***/lib/anaconda3/lib/python3.9/site-packages/xorbits/_mars/dataframe/datasource/read_parquet.py", line 772, in read_parquet
    file_path = glob(path, storage_options=storage_options)[0]
  File "/Users/***/lib/anaconda3/lib/python3.9/site-packages/xorbits/_mars/lib/filesystem/core.py", line 77, in glob
    return fs.glob(path)
  File "/Users/***/lib/anaconda3/lib/python3.9/site-packages/xorbits/_mars/lib/filesystem/fsspec_adapter.py", line 94, in glob
    FileSystemGlob(self).glob(self._normalize_path(path), recursive=recursive)
  File "/Users/***/lib/anaconda3/lib/python3.9/site-packages/xorbits/_mars/lib/filesystem/_glob.py", line 60, in glob
    return list(self.iglob(pathname, recursive=recursive))
  File "/Users/***/lib/anaconda3/lib/python3.9/site-packages/xorbits/_mars/lib/filesystem/_glob.py", line 112, in _iglob
    for name in glob_in_dir(dirname, basename, dironly):
  File "/Users/***/lib/anaconda3/lib/python3.9/site-packages/xorbits/_mars/lib/filesystem/_glob.py", line 126, in _glob1
    return fnmatch.filter(names, pattern)
  File "/Users/***/lib/anaconda3/lib/python3.9/fnmatch.py", line 61, in filter
    for name in names:
  File "/Users/***/lib/anaconda3/lib/python3.9/site-packages/xorbits/_mars/lib/filesystem/_glob.py", line 125, in <genexpr>
    names = (x for x in names if not _ishidden(x))
  File "/Users/***/lib/anaconda3/lib/python3.9/site-packages/xorbits/_mars/lib/filesystem/_glob.py", line 35, in _ishidden
    return path[0] in (".", b"."[0])
IndexError: string index out of range
aresnow1 commented 1 year ago

Thanks for your report, we will fix it ASAP.

aresnow1 commented 1 year ago

We have released new version and fixed this issue, you can install latest version via pip install -U xorbits and try again.

smartguo commented 1 year ago

We have released new version and fixed this issue, you can install latest version via pip install -U xorbits and try again.

Thanks for your notice, it just works!