Open Mintactus opened 5 months ago
Hi there, thank you for bringing this to our attention. It's great to see performance improving and community interest in new libraries - we constantly monitor requests like this one. Pandas is still used and loved by the majority of our customers, while Polars is emerging. We will evaluate whether it's worth integrating natively, but in the meantime we will suggest using polars.from_pandas
as an efficient workaround.
+1 for exporting to polars
Moving this to the GDS Python Client repository. The GDS library itself is agnostic to Pandas/Polars. Exports are possible using Bolt or Arrow. The internals of GDS are not based on Arrow, but are our own custom implementation, with some third party data structures (not Arrow itself).
The GDS Python Client wraps the Neo4j Python Driver (https://github.com/neo4j/neo4j-python-driver) which dictates the basis of the GDS Python Client's export functionality for Cypher queries, through the Neo4j Python Driver's to_df()
method (docs).
To get this Cypher driver to export to Polars as well, I suggest raising an issue on that repository. I will also mention it via Neo4j-internal channels.
The GDS Python Client can also export using Apache Arrow via the GDS Arrow Server. This does not use the Neo4j Python Driver, but makes an independent connection to the GDS Arrow Server using an Arrow client based on the pyarrow
library. The pyarrow
library returns results from the Arrow stream as Table
(docs) objects, which have a to_pandas()
(docs) method.
As @gminneci mentions, Polars support reading from a Pandas DataFrame, so it possible to hook up the workflow.
It is not directly possible for the GDS Python Client to use a different method from the underlying pyarrow
library.
It is not perfectly in line with the purpose of the GDS Python library to support conversion between two third-party data structures (pyarrow.Table
and polars.DataFrame
). If either of pyarrow
or Polars
would support this, it would be more convenient. As it stands, conversion goes via polars.from_pandas()
, which is still a more appropriate location compared to the GDS Python Client.
We are naturally very happy to see the interest in GDS and its software parts (library, client, database) so we are not rejecting this feature request. However, in the presence of workarounds and no very low-hanging possibilities for uniform integration (other than bundling Polars and calling from_pandas()
within this library, which doesn't seem so attractive), we're keeping this tracked with no immediate plan to address it.
Thank you for raising this issue! All the best Mats
https://pola.rs/
Polars is setting a brand new standard of data processing, it would be awsome to have it as an option for the output for a gds function. It could be an parameter you can chose when when you build the gds client, exportType = [pandas, polars, apache arrow IPC, etc. ]
Not just having pandas who is depreciated