mozilla / telemetry-analysis-service

Telemetry Analysis Service
Mozilla Public License 2.0
35 stars 29 forks source link

Add an option to use Python 3.x on clusters and scheduled jobs #597

Open robhudson opened 7 years ago

robhudson commented 7 years ago

Currently the default Python on the clusters is Python 2.7. I believe a long term goal should be to move to Python 3.x. But we may have to wait for Amazon. I'm opening this up for discussion and issue tracking against.

The latest Amazon EMR base Linux AMI (2017.03) installs Python 2.7 and 3.4. The Amazon EMR docs state:

Python Defaults

Python 3.4 is now installed by default, but Python 2.7 remains the system default. You may configure Python 3.4 as the system default using either a bootstrap action; you can use the configuration API to set PYSPARK_PYTHON export to /usr/bin/python3.4 in the spark-env classification to affect the Python version used by PySpark.

fbertsch commented 7 years ago

Should this instead by "Add an option to use Python 3.x"? Migrating all jobs seems like a big ask. Then we can have a gradual transition to 3.x, eventually making it default for new clusters, and finally sunsetting any Python 2.7 use.

acmiyaguchi commented 7 years ago

Amazon might make it to python 3.x before we do, moztelemetry and mozetl are still using python 2.7. There are no real incentives to moving off, since migrations are a pretty rough process.

Adding python 3.x as a notebook kernel choice in jupyer and zeppelin would be a nice add.

wcbeard commented 7 years ago

Just noting here that according to this announcement, python 2 isn't supported with the current release version of the python kernel, IPython 6