kestra-io / plugin-spark

Apache License 2.0
3 stars 2 forks source link

Given SparkCLI sample code does not work #45

Open shrutimantri opened 10 months ago

shrutimantri commented 10 months ago

Expected Behavior

The spark code should run successfully. The complete code is taken from: https://kestra.io/plugins/plugin-spark/tasks/io.kestra.plugin.spark.SparkCLI

Actual Behaviour

The code fails. Getting the following stacktrace:

2024-01-16 09:45:13,490 INFO  docker-java-stream--1581398104 f.s.7.1ZgMUn3lXWwQvgI2UuQMvt   File "/tmp/4jJCxhkCxVNo2i57NTIw9i/pi.py", line 21, in <module>
2024-01-16 09:45:13,497 INFO  docker-java-stream--1581398104 f.s.7.1ZgMUn3lXWwQvgI2UuQMvt     count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
2024-01-16 09:45:13,501 INFO  docker-java-stream--1581398104 f.s.7.1ZgMUn3lXWwQvgI2UuQMvt             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-01-16 09:45:13,505 INFO  docker-java-stream--1581398104 f.s.7.1ZgMUn3lXWwQvgI2UuQMvt   File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/context.py", line 803, in parallelize
2024-01-16 09:45:13,508 INFO  docker-java-stream--1581398104 f.s.7.1ZgMUn3lXWwQvgI2UuQMvt   File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/context.py", line 821, in parallelize
2024-01-16 09:45:13,511 INFO  docker-java-stream--1581398104 f.s.7.1ZgMUn3lXWwQvgI2UuQMvt   File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/context.py", line 867, in _serialize_to_jvm
2024-01-16 09:45:13,514 INFO  docker-java-stream--1581398104 f.s.7.1ZgMUn3lXWwQvgI2UuQMvt   File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/context.py", line 815, in reader_func
2024-01-16 09:45:13,517 INFO  docker-java-stream--1581398104 f.s.7.1ZgMUn3lXWwQvgI2UuQMvt   File "/opt/bitnami/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
2024-01-16 09:45:13,521 INFO  docker-java-stream--1581398104 f.s.7.1ZgMUn3lXWwQvgI2UuQMvt   File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
2024-01-16 09:45:13,525 INFO  docker-java-stream--1581398104 f.s.7.1ZgMUn3lXWwQvgI2UuQMvt   File "/opt/bitnami/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
2024-01-16 09:45:13,528 INFO  docker-java-stream--1581398104 f.s.7.1ZgMUn3lXWwQvgI2UuQMvt py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.

Steps To Reproduce

Run the code as provided. Explicitly used the image: bitnami/spark:3.4.1

Environment Information

Example flow

No response