jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
620 stars 223 forks source link

Availability modes #1095

Closed kevin-bates closed 2 years ago

kevin-bates commented 2 years ago

1086 references an issue whereby the loading of persisted kernel sessions at EG's startup was commented out when the changes for #737 were merged. PR #737 essentially enabled the ability to, so to speak, have multiple instances of EG running simultaneously emulating an active-active availability. The previous code, on the other hand, emulated more of an active-passive behavior where only a single EG instance is running but introducing a higher degree of resiliency, as pointed out in #1086. Some users have found that functionality helpful and we should try to accommodate that use case as well.

This pull request introduces a configurable option named availability_mode that can hold one of three values: None (default), active-active, and active-passive. Both non-none values require that kernel session persistence also be enabled. Since 'active-active' was essentially the default behavior (when kernel session persistence was enabled), we will automatically set the availability_mode to active-active whenever kernel session persistence is enabled and availability mode is not - thereby providing a form of backward compatibility.

Users desiring a single-instanced EG that is capable of restarting following an unexpected failure can now use the availability mode of 'active-passive'.

These modes (including kernel session persistence) can be enabled via a configuration file, command line, or environment variables as noted in the documentation or when running jupyter enterprisegateway --help-all.

As noted in the companion documentation, this functionality should be considered experimental!

Resolves: #1086

kevin-bates commented 2 years ago

I still need to apply the final name changes with "Standalone" and "Replication" so let's not merge yet.

kevin-bates commented 2 years ago

Need to rework the docs now that #1101 has been merged.