neo4j / graph-data-science-client

A Python client for the Neo4j Graph Data Science (GDS) library
https://neo4j.com/product/graph-data-science/
Apache License 2.0
183 stars 44 forks source link

GDS Projection from polars / pandas dataframe or arrow table #654

Closed Mintactus closed 2 months ago

Mintactus commented 2 months ago

The new Polars dataframe multi engine is absolutely a must in the data industry. After using it for months, the performance benifits are insane, adios pandas, your time has come.

Allowing GDS to export as and create projections from polars dataframes would be natural today. ( At least once Polars will be out of Alpha )

Even better, GDS being based on apache arrow, I think it would make sens for GDS to create projection directly from an arrow table ? This will makes it agnostic to the engine processing the data.

Mats-SX commented 2 months ago

Duplicate of neo4j/graph-data-science-client#653

Mats-SX commented 2 months ago

While projection and export are two distinct feature in GDS and its representation in the GDS Python Client, the question of what kind of DataFrame libraries they accept is seen as a global integration. If we added support for Polars, it should apply for both export and projections.

For now, the same workaround to convert to/from pandas data frames will assist workflows based on Polars. My discussion in the other issue applies similarly for projection, where we make use of Table.from_pandas() in pyarrow, but there is no Table.from_polars().

Mintactus commented 2 months ago

Since, polars and pandas ( at least the recent version ) and GDS, and so much more on the market all shared one thing in commun, apache arrow format, would it be a solution to simply import and export ( optionally or as a default behavior ) in arrow table format ?

This will makes GDS agnostic the to engine processing the data before they are shipped into or out of GDS?

Thanks

Mats-SX commented 2 months ago

It is a possibility. But the pyarrow.Table type is not as ubiquitous as the pandas.DataFrame type. It is nice to have DataFrame in the API.

But we can have a polymorphic parameter set, and allow passing in pyarrow.Table objects directly. It would be some work to accomplish, but it would be possible.

Mintactus commented 2 months ago

At the moment it's probably not critical but over time if there are multiple engines on the market using apache arrow it could be a quicker way to make GDS agnostic to these engines. Polars / Pandas are for now the main ones I know, polars seriously kicking the ass of pandas.

On Tue, Jun 11, 2024 at 6:27 AM Mats Rydberg @.***> wrote:

It is a possibility. But the pyarrow.Table type is not as ubiquitous as the pandas.DataFrame type. It is nice to have DataFrame in the API.

But we can have a polymorphic parameter set, and allow passing in pyarrow.Table objects directly. It would be some work to accomplish, but it would be possible.

— Reply to this email directly, view it on GitHub https://github.com/neo4j/graph-data-science-client/issues/654#issuecomment-2160388711, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIBVDPLHBDBZWDK4IAHPXLZG3GJBAVCNFSM6AAAAABIWRQEOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGM4DQNZRGE . You are receiving this because you authored the thread.Message ID: @.***>