The existing Airflow code under src/airflow needs to be refactored to separate out the common functions (create cluster, submit PySpark job, delete cluster) and configuration (version, project, paths). This will prevent code duplication and divergence of the existing Airflow code vs. new one I'm writing for the Preprocess pipeline.
Feature ticket under https://github.com/opentargets/issues/issues/3028.
The existing Airflow code under
src/airflow
needs to be refactored to separate out the common functions (create cluster, submit PySpark job, delete cluster) and configuration (version, project, paths). This will prevent code duplication and divergence of the existing Airflow code vs. new one I'm writing for the Preprocess pipeline.