Open filippo82 opened 4 years ago
Does this assume that a JupyterHub (running inside a container?) is already running on AWS?
It very almost does assume JupyterHub is already running on AWS. The spawner, which would be running in JupyterHub, extracts the private IP address of the Fargate task, and attempts to communicate to that. However...
Potentially you could have some sort of VPN setup with wherever JupyterHub is running and the subnet(s) in AWS, so technically JupyterHub doesn't have to be inside AWS... but typically, it would be. (I don't know too much about this approach to be honest)
JupyterHub itself doesn't need to be in a container, it just needs to be able to connect to the Fargate tasks (and vice versa). For example, it could be directly on an EC2 instance.
how/where do I set up the JupyterHub in the first place?
This may be not the most helpful answer, but I think I'm going to point you to Google / the JupyterHub docs, or something like https://github.com/georgebearden/aws-fargate-jupyterhub-demo or https://tljh.jupyter.org/en/latest/install/amazon.html#installing-on-amazon-web-services, otherwise, I would just be reproducing what is available elsewhere. It's probably a bit of a project whichever way you slice it...
If you get stuck on some step in terms of integration between JupyterHub and this spawner, feel free to post here/raise an issue (but realistically, I can't guarantee a reply in any timely manner)
@filippo82 We were able to put together a working implementation - please let me know if we can help.
Hi @adpatter. I would like to know how you managed to get a working implementation going. I'm having trouble passing spawner envars and I'm using jupyterhub version 1.4.2. This is what I'm seeing when trying to get_env() from the spawner.
>>> from jupyterhub.spawner import Spawner
>>> spawner = Spawner()
>>> spawner.get_env()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.8/dist-packages/jupyterhub/spawner.py", line 777, in get_env
env['JUPYTERHUB_HOST'] = self.hub.public_host
AttributeError: 'NoneType' object has no attribute 'public_host'
@Abhi94N The README provides guidance on how to pass environment variables to the Spawned environment (https://github.com/uktrade/fargatespawner). I may not know what you are trying to accomplish. Please let me know if you would like to discuss over a screenshare.
Please see where environment variables are set in the containerOverrides.
c.FargateSpawner.get_run_task_args = lambda spawner: {
'cluster': 'jupyerhub-notebooks',
'taskDefinition': 'jupyterhub-notebook:7',
'overrides': {
'taskRoleArn': 'arn:aws:iam::123456789012:role/notebook-task',
'containerOverrides': [{
'command': spawner.cmd + [f'--port={spawner.notebook_port}', '--config=notebook_config.py'],
'environment': [
{
'name': name,
'value': value,
} for name, value in spawner.get_env().items()
],
'name': 'jupyterhub-notebook',
}],
},
'count': 1,
'launchType': 'FARGATE',
'networkConfiguration': {
'awsvpcConfiguration': {
'assignPublicIp': 'DISABLED',
'securityGroups': ['sg-00026fc201a4e374b'],
'subnets': ['subnet-01fc5f15ac710c012'],
},
},
}
@adpatter Sure we can screenshare. So I've attempted a few things where I have tried to deploy the jupyterhub stack completely into fargate nodes with EKS as well as deploy only the single user notebook to fargate while the rest of the jupyterhub stack is deployed in a manage nodegroup.
I actually do have my envars set using the README config but I'm not able to pass envars from the hub to the spawner.
I remoted into the hub and tested the get_env() command but I was not able to get it to work from the hub itself.
@Abhi94N I'm not clear on what you mean. Please email me and we can arrange a time to discuss: adpatter@umich.edu.
Dear @michalc,
sorry for the naive question: but how/where do I set up the JupyterHub in the first place? Does this assume that a JupyterHub (running inside a container?) is already running on AWS?
Thanks!
Best wishes, -Filippo