neo4j / neo4j-spark-connector

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
https://neo4j.com/developer/spark/
Apache License 2.0
312 stars 111 forks source link

Compatibility with Spark 3.0 #206

Closed usmanmunara closed 3 years ago

usmanmunara commented 3 years ago

Hi,

I have been trying to connect Spark 3.0 with Neo4j 4.1. However, the connector doesn't seem to work, it's throwing quite a lot of errors. Before sharing the specific errors I have I was just curious if neo4j-spark-connector is ported to support Spark 3.0 with Neo4j 4.1.

Thanks

moxious commented 3 years ago

Related issue: https://github.com/neo4j-contrib/neo4j-spark-connector/issues/203

We're in the process of updating the connector to work with the Spark 2.4 dataframes API. 3.0 is still not supported because it is relatively new and not widely adopted, and had breaking backwards compatibility changes. Please see the latest release -- we just put one out yesterday, that has the latest available code.

wangying0420 commented 3 years ago

Our cluster uses spark3.0. I believe many companies have upgraded to the latest spark. I think it is necessary to be compatible with spark3.0

bbenzikry commented 3 years ago

To add to the discussion on adoption, spark 3 support is the only reason we're not using this great project and rely on convoluted methods to load data from our tables due to upstream dependencies that simply don't work well with spark 2.

2.x support in frameworks is heavily dwindling. An example from databricks is delta lake image

Really looking forward for a spark 3 release P.S great work 👍 I really like the direction this is taking and just had to say I really appreciate the effort and hard work you're all investing in this

moxious commented 3 years ago

Need to consider release strategy going forward when this work is done. JARs which are released need to be versioned according to:

mvn sub-modules; business logic vs. spark specifics. Spark 3.0 deprecated scala 2.11 & added support for scala 2.13

utnaf commented 3 years ago

Hi @usmanmunara, we just merged the Spark 3.0 support! We are not releasing it yet since we need to do some additional testing, but please feel free to use the attached jar to play around and let us know if you have any issue!

Be aware that SaveMode.ErrorIfExists is not supported at the moment for Spark 3.0.

neo4j-connector-apache-spark_2.12_3.0-4.0.0.jar.zip