Closed SolarisYan closed 5 years ago
@SolarisYan - thank you for the issue.
Please note that support for Kubernetes is still under development and kernel-pod.yaml
customization is one of the areas in which we feel should be addressed and is required for extensions like this.
Although I have not run with a gpu-configured kernel, a colleague has. This required modification of the kernel-pod.yaml
file. Since we use the same kernel-pod.yaml to launch all k8s kernels, this can be a bit unwieldy at the moment. That said, here is the modified kernel-pod.yaml used for that.
apiVersion: v1
kind: Pod
metadata:
name: $kernel_username-$kernel_id
namespace: $namespace
labels:
kernel_id: $kernel_id
app: enterprise-gateway
component: kernel
annotations:
scheduler.alpha.kubernetes.io/nvidiaGPU: '{ "AllocationPriority": "Dense" }'
scheduler.alpha.kubernetes.io/tolerations: '[ { "key": "dedicated", "operator": "Equal", "value": "gpu-task" } ]'
spec:
restartPolicy: Never
tolerations:
- effect: NoSchedule
key: dedicated
operator: Equal
value: gpu-task
containers:
- env:
- name: EG_RESPONSE_ADDRESS
value: $response_address
- name: KERNEL_CONNECTION_FILENAME
value: $connection_filename
- name: KERNEL_LANGUAGE
value: $language
- name: KERNEL_SPARK_CONTEXT_INIT_MODE
value: $spark_context_init_mode
- name: KERNEL_USERNAME
value: $kernel_username
- name: KERNEL_ID
value: $kernel_id
- name: KERNEL_NAMESPACE
value: $namespace
image: $docker_image
name: $kernel_username-$kernel_id
command: ["/etc/bootstrap-kernel.sh"]
nodeSelector:
ibm-cloud.kubernetes.io/gpu-type: nvidia-TeslaK80
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: "1"
cpu: "4"
memory: 25165824k
requests:
alpha.kubernetes.io/nvidia-gpu: "1"
cpu: "4"
memory: 25165824k
Note the annotations, tolerations, nodeSelector, and resources additions. Also, we've placed curly braces {}
around the parameter names since then.
I hope this helps.
Pull request #426 allows for the support of per-kernel kernel-pod.yaml
files such that any file in the kernelspec's directory will be honored over similarly-named generic files. As a result, you could build a kernelspecs tar file (via make kernelspecs
or make dist
) which uses, for example, etc/kernelspecs/python_tf_gpu_kerbernetes/scripts/kernel-pod.yaml
over the default kernel-pod.yaml
located in etc/kubernetes/scripts
.
@kevin-bates Thanks a lot, i have use change the etc/kernelspecs/python_tf_gpu_kerbernetes/scripts/kernel-pod.yaml for this function, it's very cool
@SolarisYan - I'm glad you're able to move forward. If you run into issues or have suggestions for how we can make things better, please feel free to open issues or submit pull requests. We'd appreciate your contributions in whatever manner.
Please feel free to close this issue once you've reached a reasonable conclusion.
Thanks.
Could there be a way in the future to allocate these at kernel-startup time? It would be useful to be able to specify the number of cores, amount of memory, etcetera, individually for each kernel that is started.
@bashterm - Yes. The kernel-pod.yaml
file is essentially templated. Currently, as of #403, any envs prefixed with KERNEL_
will be recognized as substitutable parameters, so, assuming the number of different values are not too unwieldily, it would be conceivable to provide a limited form of customization. This, coupled with the subtle change in #426 that will honor locally defined kernel-pod.yaml
files (in the repo) over the generic version, I believe, gets us a good way down that path.
If you know of other ways this kind of thing has been addressed in k8s applications, please share those ideas. Thanks.
I don't personally know of how this kind of thing has been addressed. However, I think I can make use of the environment variables.
In the commit referenced in #403 it says that environment variables are supported in the env
of a post request at kernel creation. Is there documentation on that somewhere? I'm not able to find any info about what can go there, but I may be looking in the wrong project.
Right. Although we touch on this capability in the EG docs, it's not nearly sufficient - we'll need to add more substance around this area. The flow occurs in NB2KG extension and is transferred into the kernel's startup env in the handler inherited from Kernel Gateway.
If, for example, I wanted to flow KERNEL_USERNAME, I would do something like the following...
NB_PORT=9003
export KG_URL=http://${JKG_HOST}:8888
export KERNEL_USERNAME=alice
jupyter notebook \
--NotebookApp.session_manager_class=nb2kg.managers.SessionManager \
--NotebookApp.kernel_manager_class=nb2kg.managers.RemoteKernelManager \
--NotebookApp.kernel_spec_manager_class=nb2kg.managers.RemoteKernelSpecManager \
--no-browser \
--NotebookApp.port=${NB_PORT} \
--NotebookApp.ip=0.0.0.0 \
--log-level=DEBUG
I hope that helps.
I should also have included the code from our gateway_client
we use for some of our testing - and demonstrates a means of interacting with a kernel programatically.
@kevin-bates thank you so much. That last bit of code in the gatewayclient looks to be almost exactly what I need. If I add environment variables into my kernel-pod.yaml files for CPU, RAM, GPU, etc, I can pass those as `KERNEL` Environment variables through from my frontend
@bashterm - yes the variables in your modified kernel-pod.yaml
file would be the lower-cased, and curly-braced, (look at that, I'm a poet! 😄) form of the environment variable. E.g., to flow KERNEL_CPUS
, you'd use ${kernel_cpus}
as the "target" in kernel-pod.yaml
.
Sorry for the spam, but I should also mention that any KERNEL_
variables listed in the kernel.json
env:
stanza can also be used as substitution "sources" into the kernel-pod.yaml file. So, again, you could extend kernel.json
with something like...
"env": {
"KERNEL_CPUS": "4"
},
and that will fill the target ${kernel_cpus}
. So its not a requirement that these per kernel values come all the way from the client.
I think that's somethign I can use to write defaults. Which value gets used first, if both are set? If a value from the client gets used first then I'll write the defaults into kernel.json
.
Hmm - good point. Since the user payload will seed the env first, then the kernel.json env will be used to update that dict, using the kernel.json's env stanza for default values won't work - shoot.
Seems like we'll want a way to address substitutable defaults. I suppose EG could figure out a way to make a copy of the env, prior to calling into the framework, then, via an override update a last time, but looking at jupyter_client and the various overrides, I don't really see a place for that at the moment.
Hi @kevin-bates, Can we get the user who connected to the kernel now? As I have integrated jupyterhub and this gateway. When the user connected to the kernel, I want to know which user is connected to the kernel. Can gateway do this now?
@SolarisYan - it's great to hear you're integrating with jupyterHub!
Enterprise Gateway flows values of env KERNEL_USERNAME
from the client to the kernel. As a result, this is more of a jupyterhub question - how does hub expose the authenticated user to the spawned application? Taking a quick look at the docs and code, it appears the authenticated username is exposed to the spawner in two env variables by Hub during the spawn - USER
and JUPYTERHUB_USER
. This would then become an exercise for how to set an application env variable from a "built-in" env variable - and set the env KERNEL_USERNAME
to either of those values. Since the config files can have python code in them, I wonder if there's a way to "transfer" JUPYTERHUB_USER
to KERNEL_USERNAME
in the config or we have NB2KG perform that transfer if KERNEL_USERNAME
is not already set and JUPYTERHUB_USER
is.
I'm assuming your implementation is spawning NB2KG-enabled Notebook instances - is that correct? And is the EG contained (local) in the spawned NB image or is EG running outside the spawned image, but the NB knows how to target it via KG_URL? I think either is fine, but we'd like the latter eventually.
Since a couple of us are also looking into JupyterHub integration, it would be great to coordinate efforts or build on each other's work. I was thinking a jupyterhub
directory under etc
might be a good location for this kind of thing. If you don't mind submitting a PR when you're ready, that would be great.
@kevin-bates Yes i use KG_URL and i think this is the better way. I run jupyterhub and EG in kubernetes. I build a new image from elyra/nb2kg, I modified this script start-singleuser.sh in the image, set the three class parameters related to nb2kg according to whether there is environment variable KG_URL.So if we set KG_URL in the jupyterhub configuration file, we will use EG to provide the kernels. I am happy to share my work, when I finally complete the integration
@SolarisYan - Outstanding! Thank you for basing this off elyra/nb2kg. I'm curious why you were unable to use the existing start-nb2kg.sh script. This will allow for either Notebook or Lab to be used. We could also apply the mapping of JUPYTERHUB_USER
to KERNEL_USERNAME
here.
Thank you for working on this!
@SolarisYan - thought you might be interested in #429. Please note that we changed all the kubernetes image names via #427 to drop the kubernetes-
prefix - so PR #429 will also contain those changes. I apologize if that introduces an inconvenience.
@kevin-bates Thanks for your advice. I have use the start-nb2kg.sh script to finish it,and here i map the JUPYTERHUB_USER to KERNEL_USERNAME, they work well. I am sorting out the relevant documents. When I finish, I will share them out.
@SolarisYan I was playing with Hub as well and I am really eager to see your updates. Don't hesitate in providing an in-progress PR so we can start reviewing and playing with.
@SolarisYan - This issue morphed into a discussion about JupyterHub. However, I'd like to know if your questions regarding pod customizations, etc. have been addressed and we can close this issue?
Please advise.
@kevin-bates I'm so sorry for late reply.I have completed the integration of jupyterhub and gateway in kubernetes cluster. As after finished that, I developed other functions, such as billing for cpu and gpu, and monitoring functions. I am very happy to see that the community has completed the integration documentation and very sorry that I have not helped.I hope that I can do something for the community in the future. Thanks again for the community's dedication, I will close this issue.
@SolarisYan - that's great news! If you happen to develop anything that you feel is a good fit for Enterprise Gateway and can be shared, please don't hesitate to contribute it.
Thank you.
@SolarisYan I make @kevin-bates words, mine... I am pretty sure you have done a much more detailed implementation and we are more than ok to get the current one improved. Things like monitoring, user propagation, etc are definitely things that you could enhance on the current integration. Feel free to open small enhancement prs for any of these things or other ones.
Hi there, i have a quesion about the gpu kernel like kubernetes-kernel-tf-py-gpu. i want to use it for ai , but i'm not sure how k8s will to scheule it to a node with gpu. i have label the gpu node for type=gpu, how can i config it for k8s to schedule it to my gpu node.
Any one can help me? I would really appreciate.