neo4j-contrib / neo4j-spark-connector

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
https://neo4j.com/developer/spark/
Apache License 2.0
312 stars 111 forks source link

Pass parameters to the connector #624

Open lvijnck opened 3 months ago

lvijnck commented 3 months ago

Guidelines

Please note that GitHub issues are only meant for bug reports/feature requests. If you have questions on how to use the Neo4j Connector for Apache Spark, please ask on the Neo4j Discussion Forum instead of creating an issue here.

Feature description (Mandatory)

Add option to pass params to Neo.

I'm working on a Kedro integration for Neo4J, and this connector seems to be perfect. However, I want to define my queries in a Pythonic way, using Pypher. This works fairly well, e.g.,


from kedro_datasets.spark import SparkDataset
from kedro_datasets.spark.spark_dataset import _get_spark

class Neo4JDataset(SparkDataset):

    ...

    def _load(self) -> Any:
      """Load Neo4J table as SparkDataset"""
      spark_session = _get_spark()
      return (
            spark_session.read.format("org.neo4j.spark.DataSource")
            .option("database", self._database)
            .option("url", self._url)
            .options(**self._credentials)
            .option("query", str(self._load_query(self._query))) 
        )

    # with self._query an arbitrary Pypher object, e.g.,
    # query = (
    #.    pypher.MATCH.node("person", labels="Person")
    #     .rel_out(labels="LIKES")
    #     .node("movie", "Movie")
    #     .RETURN(__.person.__id__.ALIAS("p"))
    # )

However, binding variables that come from the Kedro context are not possible, due to the inability to specify params. Pypher already has the bound_params attributes that yields a nicely formatted dictionary.

The inability to specify params is rather akward here, especially since predicate pushdown is disabled for the query option.

Considered alternatives

N/A

How this feature can improve the project?

Better adoption