kaaveland / pyarrowfs-adlgen2

Use pyarrow with Azure Data Lake gen2
MIT License
25 stars 6 forks source link

TypeError: strptime() argument 1 must be str, not datetime.datetime #3

Closed martindut closed 3 years ago

martindut commented 3 years ago

Hi. I'm very new to python, so please excuse me if I'm talking nonsense. I'm trying to run this code (and I had it working at some stage, but not sure what went wrong)

` import pandas as pd import azure.identity as ai

import pyarrow as pa import pyarrow.fs import pyarrow.dataset as pyds import pyarrowfs_adlgen2

tenant_id="xxxxxxxxxxxxxxxx" client_id="xxxxxxxxxxxxxxxx" client_secret="xxxxxxxxxxxxxxxx"

cred = ai.ClientSecretCredential(tenant_id=tenant_id, client_id=client_id, client_secret=client_secret) handler = pyarrowfs_adlgen2.AccountHandler.from_account_name("xxxxxx", cred) fs = pa.fs.PyFileSystem(handler)

dl_path = "xxxxxxxxxxxxxxxxxxx/part-00000-edf2811d-75a0-479c-9d4b-44093e3247af.c000.snappy.parquet" df = pd.read_parquet(fs.normalize_path(dl_path), filesystem=fs)

` If keep getting this error:

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.8/site-packages/pandas/io/parquet.py", line 459, in read_parquet return impl.read( File "/usr/local/lib/python3.8/site-packages/pandas/io/parquet.py", line 221, in read return self.api.parquet.read_table( File "/usr/local/lib/python3.8/site-packages/pyarrow/parquet.py", line 1607, in read_table dataset = _ParquetDatasetV2( File "/usr/local/lib/python3.8/site-packages/pyarrow/parquet.py", line 1439, in __init__ if filesystem.get_file_info(path).is_file: File "pyarrow/_fs.pyx", line 438, in pyarrow._fs.FileSystem.get_file_info File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/_fs.pyx", line 1004, in pyarrow._fs._cb_get_file_info File "/usr/local/lib/python3.8/site-packages/pyarrowfs_adlgen2/core.py", line 432, in get_file_info return [self._get_file_info(path) for path in paths] File "/usr/local/lib/python3.8/site-packages/pyarrowfs_adlgen2/core.py", line 432, in <listcomp> return [self._get_file_info(path) for path in paths] File "/usr/local/lib/python3.8/site-packages/pyarrowfs_adlgen2/core.py", line 429, in _get_file_info return self._fs(fs_name)._get_file_info(path) File "/usr/local/lib/python3.8/site-packages/pyarrowfs_adlgen2/core.py", line 231, in _get_file_info return self._create_file_info(path_properties) File "/usr/local/lib/python3.8/site-packages/pyarrowfs_adlgen2/core.py", line 199, in _create_file_info mtime=_parse_azure_ts(path_properties.last_modified) File "/usr/local/lib/python3.8/site-packages/pyarrowfs_adlgen2/core.py", line 29, in _parse_azure_ts return datetime.datetime.strptime(last_modified, fmt) TypeError: strptime() argument 1 must be str, not datetime.datetime

Thanks

kaaveland commented 3 years ago

Hi,

Looks like a new version of azure-storage-file-datalake has changed the type of the .last_modified attribute of path properties from str to datetime. I'll patch this shortly. Meanwhile, this should work for you if you downgrade, pip install azure-storage-file-datalake==12.2.0.

martindut commented 3 years ago

Thanks! Your suggestion worked!

kaaveland commented 3 years ago

Released pyarrowfs-adlgen2==0.1.3 which fixes this issue now. Thanks for reporting!

martindut commented 3 years ago

Released pyarrowfs-adlgen2==0.1.3 which fixes this issue now. Thanks for reporting!

Thanks