Open BjarkeTornager opened 1 month ago
Hi @BjarkeTornager, this is something that could be on the roadmap but not yet been prioritized as we typically wait for several upvotes from the community to decide how much to prioritize new integrations. There are numerous other integrations already underway for our 0.5.0 release and beyond, so hope you can understand. In the meantime, we are also releasing a basic graph algorithms package soon that can provide some of the functionality that GraphFrames does, so stay tuned!
Thanks @prrao87, looking forward to the Kùzu basic graph algorithm package!
It would be have to have spark integration with kuzu, especially for large scale data ingestion!
Just adding some scope for initial functionality here: The proposed integration would behave just like the Pandas/Polars DataFrame integration does:
Unlike Pandas/Polars, the I/O and related tasks may not be fully in-memory - we'd need to see how the persistent formats under the hood of Spark work, and also how to design the API to expose the connector to the Python client of Kùzu.
API
Python
Description
Have you considered making an integration between Kùzu and PySpark?
Neo4j, as an example, has a Neo4j connector for Apache Spark.
Spark also has a community project called GraphFrames that can be used for basic graph algorithms.
Since Spark is widely used for analytical queries, machine learning, and streaming it could be useful to move between the two.