jupyterhub / jupyterhub-on-hadoop

Documentation and resources for deploying JupyterHub on Hadoop
https://jupyterhub-on-hadoop.readthedocs.io
BSD 3-Clause "New" or "Revised" License
18 stars 6 forks source link

Manual Installation in air gap environments #7

Open hussainsultan opened 5 years ago

hussainsultan commented 5 years ago

The documentation assumes that the cluster can access public internet. This may not be the case in practice. I am not sure if the air-gap installation is in scope for this, but I thought I'd flag it here.

jcrist commented 5 years ago

How do people normally handle this? Searching cloudera's documentation I also couldn't find anything about air gap installs.

hussainsultan commented 5 years ago

CDS version of airgap documentation installation here: https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_install.html

CSD-based installs in an airgapped environment, put the Cloudera Data Science Workbench parcel into a new hosted or local parcel repository, and then configure the Cloudera Manager Server to target this newly-created repository.

Could this be done by targeting a local conda repository with required packages?

jcrist commented 5 years ago

It could. Or we could build RPMs (#8), or use conda-pack to package the environment for transport. There's lots of things that could work, I'm just not sure what's best.

sodre commented 5 years ago

@hussainsultan would creating a parcel solve help solve the software distribution problem?

If that is the case, the easiest way I can find to create one is by using conda-pack. Let me hack something really quick and post it back here.

hussainsultan commented 5 years ago

@sodre creating a parcel will solve this issue for Cloudera managed Hadoop clusters and I am not sure thats the most general answer as @jcrist mentioned. Perhaps, the best answer might be just to document one of the ways for offline install e.g. using conda-pack to create a tarball and pushing it to edge node etc.

apologies for the delay.