kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.78k stars 1.38k forks source link

How to pass string with spaces pyspark arguments #1186

Open devender-yadav opened 3 years ago

devender-yadav commented 3 years ago

One of the pyspark arg is sql query (string with spaces). I tried to pass it as -\"select * from table\" and "select * from table"

But it's not treated as string and select * bash command is getting executed which is corrupting the SQL.

Example: Above query got converted as - \"select' folder1 file1.zip from 'table\"

Driver Logs:

PYSPARK_ARGS=
+ '[' -n 'process  --query \"select * from table\"' ']'
+ PYSPARK_ARGS='process --query \"select * from table\"'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' 3 == 2 ']'
+ '[' 3 == 3 ']'
++ python3 -V
+ pyv3='Python 3.7.3'
+ export PYTHON_VERSION=3.7.3
+ PYTHON_VERSION=3.7.3
+ export PYSPARK_PYTHON=python3
+ PYSPARK_PYTHON=python3
+ export PYSPARK_DRIVER_PYTHON=python3
+ PYSPARK_DRIVER_PYTHON=python3
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@" $PYSPARK_PRIMARY $PYSPARK_ARGS)
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=xx.xx.xx.xx --deploy-mode client --class org.apache.spark.deploy.PythonRunner file:/usr/local/bin/process_sql.py process 
--query '\"select' folder1 file1.zip from 'table\"'

Is there a way to safely pass string argument with spaces, single or double quotes?

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.