Closed martindut closed 3 years ago
Hi,
Looks like a new version of azure-storage-file-datalake has changed the type of the .last_modified
attribute of path properties from str
to datetime
. I'll patch this shortly. Meanwhile, this should work for you if you downgrade, pip install azure-storage-file-datalake==12.2.0
.
Thanks! Your suggestion worked!
Released pyarrowfs-adlgen2==0.1.3
which fixes this issue now. Thanks for reporting!
Released
pyarrowfs-adlgen2==0.1.3
which fixes this issue now. Thanks for reporting!
Thanks
Hi. I'm very new to python, so please excuse me if I'm talking nonsense. I'm trying to run this code (and I had it working at some stage, but not sure what went wrong)
` import pandas as pd import azure.identity as ai
import pyarrow as pa import pyarrow.fs import pyarrow.dataset as pyds import pyarrowfs_adlgen2
tenant_id="xxxxxxxxxxxxxxxx" client_id="xxxxxxxxxxxxxxxx" client_secret="xxxxxxxxxxxxxxxx"
cred = ai.ClientSecretCredential(tenant_id=tenant_id, client_id=client_id, client_secret=client_secret) handler = pyarrowfs_adlgen2.AccountHandler.from_account_name("xxxxxx", cred) fs = pa.fs.PyFileSystem(handler)
dl_path = "xxxxxxxxxxxxxxxxxxx/part-00000-edf2811d-75a0-479c-9d4b-44093e3247af.c000.snappy.parquet" df = pd.read_parquet(fs.normalize_path(dl_path), filesystem=fs)
` If keep getting this error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.8/site-packages/pandas/io/parquet.py", line 459, in read_parquet return impl.read( File "/usr/local/lib/python3.8/site-packages/pandas/io/parquet.py", line 221, in read return self.api.parquet.read_table( File "/usr/local/lib/python3.8/site-packages/pyarrow/parquet.py", line 1607, in read_table dataset = _ParquetDatasetV2( File "/usr/local/lib/python3.8/site-packages/pyarrow/parquet.py", line 1439, in __init__ if filesystem.get_file_info(path).is_file: File "pyarrow/_fs.pyx", line 438, in pyarrow._fs.FileSystem.get_file_info File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/_fs.pyx", line 1004, in pyarrow._fs._cb_get_file_info File "/usr/local/lib/python3.8/site-packages/pyarrowfs_adlgen2/core.py", line 432, in get_file_info return [self._get_file_info(path) for path in paths] File "/usr/local/lib/python3.8/site-packages/pyarrowfs_adlgen2/core.py", line 432, in <listcomp> return [self._get_file_info(path) for path in paths] File "/usr/local/lib/python3.8/site-packages/pyarrowfs_adlgen2/core.py", line 429, in _get_file_info return self._fs(fs_name)._get_file_info(path) File "/usr/local/lib/python3.8/site-packages/pyarrowfs_adlgen2/core.py", line 231, in _get_file_info return self._create_file_info(path_properties) File "/usr/local/lib/python3.8/site-packages/pyarrowfs_adlgen2/core.py", line 199, in _create_file_info mtime=_parse_azure_ts(path_properties.last_modified) File "/usr/local/lib/python3.8/site-packages/pyarrowfs_adlgen2/core.py", line 29, in _parse_azure_ts return datetime.datetime.strptime(last_modified, fmt) TypeError: strptime() argument 1 must be str, not datetime.datetime
Thanks