nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
636 stars 116 forks source link

Make Python 3 the default on launched clusters #320

Closed nchammas closed 3 years ago

nchammas commented 3 years ago

Amazon Linux 2 AMIs (e.g. ami-0beafb294c86717a8) still launch with Python 2 as the default and only Python. Python 2 is EOL, and so is Spark's support for Python 2.

We should start installing Python 3 on launched clusters and setting it as the default Python for PySpark. The pattern would roughly follow what we do to ensure that the cluster has a recent enough version of Java installed (e.g. #316).

nchammas commented 3 years ago

334 ensures Python 3 is available on the cluster, but this issue is about making sure it's the default Python for Spark.

In Spark 3.1+ it's not an issue since Spark specifically looks for python3 (at least according to what I wrote on #334), so I think over time this problem basically solves itself.