Closed CarlaFernandez closed 2 years ago
Any idea what might I be missing from my environment to make it work?
Does it work when you launch PySpark from command-line, and specify the --packages
command-line option?
I have zero working experience with virtual environments. And I've never installed any JAR files manually to site-packages/pyspark/jars/
directory.
If I was facing a similar problem, then I'd start by checking the PySpark/Apache Spark log file. There must be some information about which packages are detected, and which of them are successfully "initialized" and which are not (possibly with an error reason).
Thanks for the quick response. Indeed, looking at the detected packages in the log is what helped me.
I started the environment from scratch, removed the jar I had manually installed, and started the session in the MWE without the spark.jars.packages
config. It threw a RuntimeError: JPMML-SparkML not found on classpath.
Then, I added the spark.jars.packages
line and it worked! So it seems like the problem was caused by adding the jar manually.
I hadn't detected this before because my real configuration was more complex and I was using delta-spark
. Apparently, when using delta-spark the packages were not being downloaded from Maven and that's what caused the original error.
Thanks!
Hello @vruusmann , First of all I'd like to say that I've checked the issue #13 but I don't think it's the same problem.
I've created a virtual environment and installed pyspark and pyspark2pmml using pip. In this virtual environment, inside Lib/site-packages/pyspark/jars I've pasted the jar for JPMML-SparkML (org.jpmml:pmml-sparkml:2.2.0 for spark version 3.2.2).
When I instantiate a PMMLBuilder object I get the error in the title. This is a MWE that throws the error:
spark = ( SparkSession.builder.appName("spark_test") .master("local[*]") .config("spark.jars.packages", "org.jpmml:pmml-sparkml:2.2.0") # it doesn't matter if I add this configuration or not, I still get the error .getOrCreate() ) javaPmmlBuilderClass = builder.sparkContext._jvm.org.jpmml.sparkml.PMMLBuilder
Any idea what might I be missing from my environment to make it work? Thank you
hello, this problem has been solved?
Hello @vruusmann , First of all I'd like to say that I've checked the issue #13 but I don't think it's the same problem.
I've created a virtual environment and installed pyspark and pyspark2pmml using pip. In this virtual environment, inside Lib/site-packages/pyspark/jars I've pasted the jar for JPMML-SparkML (org.jpmml:pmml-sparkml:2.2.0 for spark version 3.2.2).
When I instantiate a PMMLBuilder object I get the error in the title. This is a MWE that throws the error:
Any idea what might I be missing from my environment to make it work? Thank you