memgraph / gqlalchemy

GQLAlchemy is a library developed with the purpose of assisting in writing and running queries on Memgraph. GQLAlchemy supports high-level connection to Memgraph as well as modular query builder.
https://pypi.org/project/gqlalchemy/
Apache License 2.0
212 stars 29 forks source link

[BUG] CSVLocalFileSystemImporter can't load datetime from csv #300

Open SuperBo opened 5 months ago

SuperBo commented 5 months ago

Memgraph version Which version did you use? Memgraph v2.13.0

Environment Some information about the environment you are using Memgraph on: operating system, how do you connect, with or without docker, which driver etc.

Docker

Describe the bug

Cant load datetime string from csv to memgraph.

To Reproduce Steps to reproduce the behavior:

Use following scripts

importer = loaders.CSVLocalFileSystemImporter(
    data_configuration=yaml.safe_load(DATA_CONFIG),
    path = "tmp/airflow/data",
    memgraph=memgraph
)
importer.translate(drop_database_on_start=True)

Expected behavior

No error

Logs If applicable, add logs of Memgraph, CLI output or screenshots to help explain your problem.

Loading data from table tmp/airflow/data/a.csv...
Traceback (most recent call last):
  File "/Users/superbo/Workspace/airflow-pipelines/test_memgraph_load.py", line 36, in <module>
    load_data()
  File "/Users/superbo/Workspace/airflow-pipelines/test_memgraph_load.py", line 32, in load_data
    importer.translate(drop_database_on_start=True)
  File "/Users/superbo/Workspace/airflow-pipelines/.venv/lib/python3.11/site-packages/gqlalchemy/transformations/importing/loaders.py", line 472, in translate
    self._load_nodes()
  File "/Users/superbo/Workspace/airflow-pipelines/.venv/lib/python3.11/site-packages/gqlalchemy/transformations/importing/loaders.py", line 480, in _load_nodes
    self._save_row_as_node(label=collection_name, row=row)
  File "/Users/superbo/Workspace/airflow-pipelines/.venv/lib/python3.11/site-packages/gqlalchemy/transformations/importing/loaders.py", line 591, in _save_row_as_node
    QueryBuilder(connection=self._memgraph)
  File "/Users/superbo/Workspace/airflow-pipelines/.venv/lib/python3.11/site-packages/gqlalchemy/query_builders/declarative_base.py", line 748, in node
    properties_str = to_cypher_properties(kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/superbo/Workspace/airflow-pipelines/.venv/lib/python3.11/site-packages/gqlalchemy/utilities.py", line 157, in to_cypher_properties
    value_str = to_cypher_value(value, config)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/superbo/Workspace/airflow-pipelines/.venv/lib/python3.11/site-packages/gqlalchemy/utilities.py", line 109, in to_cypher_value
    return f"{datetimeKwMapping[value_type]}('{_format_timedelta(value) if isinstance(value, timedelta) else value.isoformat()}')"
              ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: <class 'pandas._libs.tslibs.timestamps.Timestamp'>

Additional context Add any other context about the problem here.

matea16 commented 5 months ago

Hi @SuperBo, thank you for reporting this. Would you be able to share your DATA_CONFIG file in order for us to replicate your issue? You don't have to share real data if it's sensitive, I'm just interested in the format of the file

SuperBo commented 5 months ago

@matea16, here is example data, i'm using pyarrow==13.0.0

account_id,phone_no,created_dt,updated_dt
abc1,+1212423,2019-08-28 12:19:18,
bcd2,+12123232,2021-06-09 09:05:41,2021-06-09 09:05:41
b11e,+94745734,2021-12-13 16:42:45,2021-12-13 16:42:45
matea16 commented 5 months ago

Thank you for providing the data, we'll take a look at your issue shortly. I'll let you know if we have any additional questions to help us with the debugging :)

katarinasupe commented 5 months ago

Hi @SuperBo, I see that you shared your CSV data, but did you create a data configuration object, similar to the the documentation? If yes, can you share that too?

matea16 commented 5 months ago

@SuperBo, I believe the issue with your CSV file could be due to the format of your date field. In our docs, you can see supported formats of temporal types, including the local time format. Let me know if adjusting that format helps

SuperBo commented 5 months ago

Hi @matea16, I also tested with iso datetime format "yyyy-MM-ddThh:mm:ss", and still met the same error.

@katarinasupe : of course, I created a data configuration like this

indices:
    accounts:
        - account_id
    transactions:
        - transaction_number

name_mappings:
    accounts:
        label: Account
    transactions:
        label: INTERACT

one_to_many_relations:
    accounts: []
    transactions: []
matea16 commented 5 months ago

I tested it out with your configuration file and it works as expected for me, even with the previous format of your datetime objects. I will provide you below with the code snippet that worked, the only difference is in parsing the yaml file, let me know if that solves the issue:

 with open("./data_config.yaml", "r") as stream:
      try:
          parsed_yaml = yaml.load(stream, Loader=SafeLoader)
      except yaml.YAMLError as exc:
          print(exc)

    importer = CSVLocalFileSystemImporter(
        data_configuration=parsed_yaml,
        path="./",
        memgraph=memgraph
    )

    importer.translate(drop_database_on_start=True)
SuperBo commented 5 months ago

hello @matea16 , new update, the problem comes from the remaining file. In that file, I put timestamp format as follwing:

2024-01-04T03:48:11.336Z

matea16 commented 5 months ago

that format is unfortunately not supported since timezones are not supported, does it work without the timezone at the end?

SuperBo commented 5 months ago

I removed the last "Z" and the same error still occurs. However, when I remove the microseconds part, it works.

matea16 commented 5 months ago

I'm glad you were able to find a workaround. The issue is indeed the format of the data field. Memgraph doesn't support timezones and milliseconds and that was causing a problem. In our docs you can see all of the supported temporal types. Thank you again for reporting this, we have two issues that I linked above related to the time zones and this helps us to prioritize them.

katarinasupe commented 4 months ago

I will close this issue since it was related to how Memgraph handles time format.

matea16 commented 1 month ago

Hi @SuperBo, I'm happy to let you know that zoned datetime data type has been added to Memgraph to the latest release with the following PR: https://github.com/memgraph/memgraph/pull/1866

SuperBo commented 1 month ago

Thank you Memgraph team, you rocks 👍

katarinasupe commented 1 month ago

Zoned datetime has been added to Memgraph, but in order for it to work properly in GQLAlchemy, I think datetime keyword should still be added to the GQLAlchemy in order for OGM to work properly.

katarinasupe commented 4 weeks ago

@matea16 I set a new milestone for GQLAlchemy. Will you be able to implement this until July 3rd?

matea16 commented 4 weeks ago

Yes, I'll make sure to have it ready by July 3rd