Closed JArma19 closed 3 years ago
@JArma19 are you sure that you're using spark with scala 2.12? can you please share you spark artifact name?
@conker84 yes I'm sure. I'm using pyspark '3.0.1' version which runs on scala 2.12 according to Spark doc (sorry don't know where I can find spark artifact name)
Ok so that's the problem, we're not supporting yet Spark 3.x
we plan to add that support during this month, in the meanwhile you should use spark >= 2.4.5
I'm closing this since we found the solution, feel free to reopen in case you need it
Ok, actually you solved my problem, I downgraded to 2.4.5 and finall it worked. Thanks for your help!
Hi, I'm trying to read nodes from my local neo4jdb for practice purposes by using pyspark and neo4j connector. I've already downloaded the last version of neo4j-connector-apache-spark (2.12) and integrated it in pyspark as explained in the repo at README. However when I try to perform a read using: `from pyspark.sql import SparkSession import os os.environ["JAVA_HOME"] = "C:\Program Files\Java\jdk-15.0.1" os.environ["HADOOP_HOME"] = "C:\Users\arman\Desktop\winutils" os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars file:///C:\Users\arman\Desktop\prova\venv\Lib\site-packages\pyspark\jars\neo4j-connector-apache-spark_2.12-4.0.0.jar pyspark-shell'
spark = SparkSession.builder \ .config('spark.jars', 'C:\Users\arman\Desktop\prova\venv\Lib\site-packages\pyspark\jars\neo4j-connector-apache-spark_2.12-4.0.0.jar') \ .config('spark.jars.packages', 'neo4j-contrib:neo4j-connector-apache-spark_2.12:4.0.0') \ .getOrCreate()
spark.read.format("org.neo4j.spark.DataSource") \ .option("url", "bolt://localhost:7687") \ .option("authentication.basic.username", "neo4j") \ .option("authentication.basic.password", "justin") \ .option("labels", "Person") \ .load() \ .show()`
I get the following error:
Blockquote py4j.protocol.Py4JJavaError: An error occurred while calling o38.load. : java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport Blockquote I think it could be related the format string "org.neo4j.spark.DataSource", but don't know how to fix.
I think I'm doing wrong something during configuration Could you please suggest me any guide or tutorial about how to set up properly pyspark in order to run neo4j connector? Thanks for yout attention