This PR would configure Flink to run in session mode. Essentially, it would create a single job manager for the cluster, and all pangeo-forge-recipes would submit their jobs to that job manager.
One of the main advantages of this would be to centralize all infrastructure configuration configuration in pangeo-forge-cloud-federation.
Currently, infrastructure is spread across pangeo-forge-cloud-federation, pangeo-forge-runner and within the individual recipe's config.py, and this makes it difficult to configure the cluster. Ideally, we could have multiple node pools of on demand and spot, instances, high-availability job managers, reactive scaling, default failure strategies, etc and set all that within pangeo-forge-cloud-federation. Then the recipe and pangeo-forge-runner require minimal configuration, like setting parallelism and the job name.
This PR would configure Flink to run in session mode. Essentially, it would create a single job manager for the cluster, and all
pangeo-forge-recipes
would submit their jobs to that job manager. One of the main advantages of this would be to centralize all infrastructure configuration configuration inpangeo-forge-cloud-federation
. Currently, infrastructure is spread acrosspangeo-forge-cloud-federation
,pangeo-forge-runner
and within the individual recipe'sconfig.py
, and this makes it difficult to configure the cluster. Ideally, we could have multiple node pools of on demand and spot, instances, high-availability job managers, reactive scaling, default failure strategies, etc and set all that withinpangeo-forge-cloud-federation
. Then the recipe andpangeo-forge-runner
require minimal configuration, like setting parallelism and the job name.