nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
636 stars 116 forks source link

Apache Spark - Cannot run python applications in standalone clusters #280

Closed ijasnahamed closed 5 years ago

ijasnahamed commented 5 years ago

I have a standalone spark cluster with one worker in AWS EC2. I copied my application python script to master and ec2 workers using copy-file command to /home/ec2-user directory. I make sure that this file exists in both instances manually. I submit a job for this application using below command

spark-submit --master spark://ec2-aaa-bbb-ccc-ddd.compute-1.amazonaws.com:7077 --deploy-mode cluster /home/ec2-user/test.py

When I submit a python job, I'm getting below error

Exception in thread "main" org.apache.spark.SparkException: Cluster deploy mode is currently not supported for python applications on standalone clusters.
        at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:857)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:284)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I tried with java application, its working fine and i can see the work in completed-application list.

Anyone has any idea whats wrong with my configuration?

nchammas commented 5 years ago

Nothing's wrong with your configuration. As the error indicates, this is simply a limitation of Apache Spark:

Cluster deploy mode is currently not supported for python applications on standalone clusters.

This error is coming from Apache Spark and not from Flintrock.

From the Spark docs:

Currently, the standalone mode does not support cluster mode for Python applications.