jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
616 stars 223 forks source link

Custom API path /api/kernelspec #779

Closed lucabem closed 4 years ago

lucabem commented 4 years ago

Hi @kevin-bates:

I want to modify the behaviour of the API call /api/kernelspec. Now, when we start a notebook server via JupyterHub, this server makes a call to /api/kernelspec and the response is a list of kernels in the NFS server.

I want to filter that kernels by user who have created them. Instead of calling /api/kernelspec, it should be /api/kernelspec/{username}. I have seen the swager.yaml but i dont find the python file of the API.

Thank you and have a nice day

kevin-bates commented 4 years ago

Hi @lucabem - its good to hear from you again.

Kernel specs are not "user owned" but are "system owned". They merely contain the metadata used to start a kernel. For example, User A and User B can each start an instance of kernel 'foo' using the 'foo' kernelspec. You don't see the kernelspecs handlers explicitly defined in Enterprise Gateway because they're essentially inherited from Notebook.

I think you're really talking about the /api/kernels endpoint, which lists the running kernels. /api/kernels/<kernel_id> is what is used to direct requests (like interrupt, restart, and shutdown) to a specific kernel and GET /api/kernels/<kernel_id> returns the model corresponding to that running kernel.

Because there's already a /api/kernels/<something> endpoint and, although the pattern match is against a string with embedded dashes, I don't think there's a hard requirement that kernel-id will always be a guid, so I hesitate to try to overload that endpoint for filtering by user.

Perhaps the MainKernelHandler's GET implementation could look for a query parameter, so something like GET /api/kernels?username=<user> could work? Then you'd probably want to copy the existing implementation of list-kernels into EG and perform the post filtering based on username - and in consultation with the KernelSessionManager (since that's where the user-to-kernel mappings would reside). Or better yet, drive the kernel list from KernelSessionManager by creating a method similar to active_sessions() which returns the list of sessions for a given user.

I would be happy to answer any questions you might have should you want to provide pull request for this.

lucabem commented 4 years ago

Maybe i have explained it bad @kevin-bates Think on a NFS server with this estructure:

Captura

When we log in JupyterHub and we see home page, the system makes a REST call: GET http://jupyter-hub-ip>:<port>/user/<user-logged>/api/kernelspecs

What im trying to do is try to filter the response of that call to just let user select between kernel_1 and kernel_2 (if is user_1). Right now, with default behaviour the response is (kernel_1, kernel_2, kernel:3 and kernel_4).

We want to reach this call: GET http://jupyter-hub-ip>:<port>/user/<user-logged>/api/kernelspecs?username=<user> or get it from the URL <user-logged>

Hope you have a nice day and thanks!

lucabem commented 4 years ago

In kernelspec.py (file from jupyter_client) there is a function called find_kernel_specs that could be usefull.

I have seen that this function search form kernels on a directory. If we were able to pass the path from NFS join username we could filter by user.

What i want to imitate is the nb_conda_kernels behaviour

kevin-bates commented 4 years ago

Hi Luis - thank you for the further explanation!

Unfortunately, at this time, EG doesn't receive any kind of hint as to which user a given request is made on behalf of. An exception to that is the POST handler for /api/kernels (to start a kernel) will pass KERNEL_USERNAME in the request body - but only from NB2KG, Notebook/GatewayClient or a custom app hitting the EG REST api.

If we were to do something similar for /api/kernelspecs, here's the approach I'd recommend...

  1. Use a query parameter to convey the username (mentioned previously) since I don't think it's appropriate to use a body on a GET request.
  2. Copy the kernelspecs handler from Notebook into EG.
  3. Leverage the authorized-users (and honor unauthorized-users) lists that can be embedded in the kernel.json files - similar to what happens today for starting kernels.
  4. Extend the copied handler to apply the query parameter (if present) against the returned kernelspecs (similar to this).
  5. Update Notebook/GatewayClient to add the query parameter to the request. I think this would be an okay thing to NOT add to NB2KG since it's essentially deprecated.

By leveraging the authorized-users lists (and ensuring the query parameter is in that list and is NOT in the unauthorized-users list) we'd have some cohesion with kernel starts. If neither list exists or no parameter was provided, we let all kernelspecs through.

I'd rather not rely on segments in a path hierarchy against which to apply a filter.

Note: If you only need to prevent certain users from starting certain kernels, you can do this today. If user2 were to select a kernel that it's not authorized to use, user2 will receive a 403 error when attempting to start that kernel. However, we don't have a means of filtering the presentation of the kernels at this time.

lucabem commented 4 years ago

Thank you @kevin-bates - that souns great!

Maybe it is as a bit greedy that estructure. I have read your post and i have noticed that if we change the name of our env, we could have them classified by username:

We could change the behaviour of MainKernelSpecHandler class. Inside the loop for kernel_name, kernel_info in kspecs.items() we could check if the kernel_name starts with USERNAME if kernel_name.startswith (kernel_user).

This method has this header def get(self, kernel_user) as get function from KernelSpecResourceHandler class.

So our function get would be:

class MainKernelSpecHandler (APIHandler):
    @web.authenticated
    @gen.coroutine
    def get(self, kernel_user):
        ksm = self.kernel_spec_manager
        km = self.kernel_manager
        model = {}
        model['default'] = km.default_kernel_name
        model['kernelspecs'] = specs = {}
        kspecs = yield maybe_future (ksm.get_all_specs ())
        for kernel_name, kernel_info in kspecs.items ():
            try:
                # Miramos si el kernel empieza con el parametro que queremos
                if kernel_name.startswith (kernel_user):
                    if is_kernelspec_model (kernel_info):
                        d = kernel_info
                        specs[kernel_name] = d
                    else:
                        d = kernelspec_model (self, kernel_name, kernel_info['spec'], kernel_info['resource_dir'])
                        specs[kernel_name] = d
            except Exception:
                self.log.error ("Failed to load kernel spec: '%s'", kernel_name, exc_info=True)
                continue
        self.set_header ("Content-Type", 'application/json')
        self.finish (json.dumps (model))
kevin-bates commented 4 years ago

Hi Luis, I would prefer we follow my previous suggestion and leverage the existing authorized_users functionality that can be added to the kernel.json files. I really don't want to infer filtering from the name of the kernel (i.e., the directory name where the kernel.json file resides).

In addition, we cannot change the signature for the GET request. With what you have above, the pattern for obtaining the list of kernelspecs would go from...

(r"/api/kernelspecs", MainKernelSpecHandler),

to something like...

kernel_user_regex = r"(?P<kernel_user>[\w%]+)"
...
(r"/api/kernelspecs/%s" % kernel_user_regex, MainKernelSpecHandler),

I believe that's why we'd need a query parameter.

We would then implement an internal method, something like apply_user_filter(kernel_user, kernelspec_model), which would look similar to this (the nesting is a bit obnoxious, but the lists are buried kinda deep)...

def apply_user_filter(kernel_user, kernelspec_model):

    # If user was specified continue, else skip everything and return the model
    if kernel_user:
        # User filter as been requested.
        if kernelspec_model.metadata:
            if kernelspec_model.metadata.process_proxy:
                if kernelspec_model.metadata.process_proxy.config:
                    if kernelspec_model.metadata.process_proxy.config.unauthorized_users:
                        if kernel_user in kernelspec_model.metadata.process_proxy.config.unauthorized_users:
                            # This user is not authorized for this kernel...
                            return None
                   if kernelspec_model.metadata.process_proxy.config.authorized_users:
                       if kernel_user not in kernelspec_model.metadata.process_proxy.config.authorized_users:
                           # This user is not authorized for this kernel...
                           return None

    return kernelspec_model

The for loop in the handler would then look something like...

            for kernel_name, kernel_info in kspecs.items ():
            try:
                # Miramos si el kernel empieza con el parametro que queremos
                if is_kernelspec_model (kernel_info):
                    d = kernel_info
                else:
                    d = kernelspec_model (self, kernel_name, kernel_info['spec'], kernel_info['resource_dir'])

                model = apply_user_filter(kernel_user, d)
                if model:
                    specs[kernel_name] = model
            except Exception:
                self.log.error ("Failed to load kernel spec: '%s'", kernel_name, exc_info=True)
                continue

What i want to imitate is the nb_conda_kernels behaviour

Are you trying to imitate the behavior or are you actually using nb_conda_kernels? I believe nb_conda_kernels essentially creates a facade that there are kernel.json entries distributed across the conda environments, but there really isn't a physical kernel.json file in each. Instead, they hook the kernelspec manager class and add some thunking logic into the argv entry which activates the conda env associated with the "pseudo" kernelspec.

Another approach you might try is start an EG per user where each EG instance uses a different --KernelSpecManager.whitelist set of values pertaining to the user. Then each user only sees the whitelisted kernels for their EG instance. This can be today.

lucabem commented 4 years ago

Hi @kevin-bates - Thanks for the solution!

I have modified the class MainKernelSpecHandler as you said (with some changes).

I have tested on PyCharm and now when we call api/kernelspecs/user/luis I recived only Kernels where test is on their authorized_users.

{ 
   "default":"python3",
   "kernelspecs":{ 
      "test":{ 
            .....
            "metadata":{ 
               "process_proxy":{ 
                  "class_name":"enterprise_gateway.services.processproxies.k8s.KubernetesProcessProxy",
                  "config":{ 
                     "image_name":"lucabem/jeg-env-base:pre",
                     "authorized_users":[ 
                        "pedro",
                        "luis"
                     ] }}}
            ....
}

Take noticed that I have modified the URL /api/kernelspecs/user/%s to difference it from /api/kernelspec/kernel call.

kevin-bates commented 4 years ago

Hi @lucabem - this sounds good. I was thinking that rather than register a different pattern, we'd instead offer the ability for an optional parameter, so if the caller wished to apply user filtering to the returned kernelspecs, they'd use: /api/kernelspecs?user=pedro. Then, MainKernelSpecHandler would check if a user query parameter exists and use that if present. MainKernelSpecHandler must also service /api/kernelspecs, while KernelSpecHandler services /api/kernelspecs/{kernel_id}.

How are you picking off the username from /api/kernelspecs/user/{username} and can a Notebook front-end still get kernelspecs?

lucabem commented 4 years ago

Hi @kevin-bates!

Instead of modifying the function get_kernel_spec, I have modified list_kernel_specs.

    def list_kernel_specs(self):
        """Get a list of kernel specs."""
        kernel_spec_url = self._get_kernelspecs_endpoint_url()
        self.log.debug("Request list kernel specs at: %s", kernel_spec_url)
        response = yield gateway_request(kernel_spec_url
                                         + '/user/' + os.getenv('USER'), method='GET')
        kernel_specs = json_decode(response.body)
        raise gen.Return(kernel_specs)

Now, the json response will contain only the kernels of 'USER'

I have to test it using Dockerfile, JupyterHub and Kubernetes cluster (the filter works on local). Do you recommend me to use your Dockerfile, or build my own image?

kevin-bates commented 4 years ago

Yes - use whatever is closest to where you want to be.

I think I understand your approach a bit better now. If you're planning on contributing the changes (both in Notebook and EG) we'll need to talk first. I would really like to not have to introduce a new pattern for retrieving kernels. Also, rather than use USER from the env, I would prefer we use something more like what's done during in start_kernel() since its actually KERNEL_USERNAME that is conveyed to the Gateway and is what EG uses to identify users. As a result, it might be best to create a small method that gets the value of username. You can certainly use USER as a fallback.

Were you planning on contributing these changes? I think they'd be helpful, but we need to be sure they're a fit for all environments and not jeopardize existing functionality, etc. I'm happy to help guide things appropriately.

lucabem commented 4 years ago

Hi @kevin-bates!

Were you planning on contributing these changes?

It would be a pleasure, but maybe in 2 months. In this moment i dont have enough free time (im ending my degree)

kevin-bates commented 4 years ago

Sounds good. Good luck with your degree!

lucabem commented 4 years ago

Hi @kevin-bates! - I am facing some problems trying to spawn containers on K8s cluster.

As you will remember, we have changed GET /api/kernelspecs to GET /api/kernelspecs/user/{user}, modifying the default behaivour.

default_handlers = [
    (r"/api/kernelspecs/user/{user}",  user_regex, ,MainKernelSpecHandler),
    (r"/api/kernelspecs/%s" % kernel_name_regex, KernelSpecHandler),
]

You have said: MainKernelSpecHandler must also service /api/kernelspecs. What does it mean? I think my problem could be linked with this, because on JEG'logs i get an 404 error comming from /api/kernelspecs

In addition, I have modified notebook to make a request to /api/kernelspecs/user/{user} instead of /api/kernelspecs

kevin-bates commented 4 years ago

I'd have to see the various code changes you've made. I might also suggest you leave MainKernelSpecHandler as-is and associated with /api/kernelspecs (so nothing changes in that respect), and create a copy of it named UserKernelSpecHandler which is associated with /api/kernelspecs/user/{user}.

Are you able to push your changes to a fork that I can see?

lucabem commented 4 years ago

Hmmm.... yes, I have override the MainKernelHandler instead of creating a new class. Tomorrow i will fork it and upload changes.

In addition, I have modified notebook/gateway to call /api/kernelspecs/user/{user} to use new JEG'path.

kevin-bates commented 4 years ago

Sounds good.

Yeah, your change to gateway probably won't fly. This is one reason why I was thinking we'd extend the current RPC to include an optional query parameter. Then, notebook/gateways using the updated code would conditionally add the query parameter - probably based on the presence of KERNEL_USERNAME. If the target EG is an older version, I believe (not certain about this) that the parameter will be ignored - yielding all kernelspecs. While updated EG targets would detect the parameter and yield the filtered set of specs.

Introducing an entirely new endpoint also introduces backward compatibility issues.

lucabem commented 4 years ago

Something i couldnt understand is the process when user select a new kernel. I know that JEG recived that request and it creates a pod into kubernetes cluster.

But, what happens between notebook server request and JEG response? Maybe, we could just let this process to /api/kernelspecs

default_handlers_notebooks = [
            (r"/api/kernelspecs/user/%s" % kernel_user_regex, ModifyKernelSpecHandler),
            (r"/api/kernelspecs/%s" % kernel_name_regex, notebook_handlers.KernelSpecHandler),
            (r"/api/kernelspecs, MainKernelSpecHandler]
lucabem commented 4 years ago

Rigth now, my architecture is:

1) JupyterHub spawns a notebook server

2) Notebook server makes a request to JEG via `/api/kernelspecs/user/{user}. It takes {user} from env variable USER

3) JEG gets user param and searchs into each kernel if param is into authorized_users

4) Notebook Server recives the reponse and shows available kernels.

5) When we select new Kernel, it crashes

kevin-bates commented 4 years ago

But, what happens between notebook server request and JEG response?

When --gateway-url is in play, the Notebook server proxies all kernel, kernelspec, kernelspec resource and kernel websocket requests to the EG server.

When we select new Kernel, it crashes

I suspect this is because Notebook (or the front end) is fetching the kernelspec directly and/or its resources (which are accessed by /kernelspecs/<kernel-name> and return the icon image for the kernel, etc.).

I haven't checked, but your ModifyKernelSpecHandler may be handling things destined for notebook_handlers.KernelSpecHandler. Something you'll need to verify.

What exception do you see when it "crashes"?

lucabem commented 4 years ago

Maybe i have explained it bad. When I select a new kernel, JEG'log gets me an 500 error POST and 404 error: /api/kernelspecs.

Its not an exeception, just is not able to connect jupyter notebook with kernel

kevin-bates commented 4 years ago

No worries. I didn't expect an exception either. What do the EG and Notebook logs/consoles have in them relative to these issues? Please be specific as to which log contains what in your response.

lucabem commented 4 years ago
def apply_user_filter(kernel_user, kernelspec_model):
    if kernel_user:
        if kernelspec_model['spec']['metadata']:
            if kernelspec_model['spec']['metadata']['process_proxy']:
                if kernelspec_model['spec']['metadata']['process_proxy']['config']:
                    if kernelspec_model['spec']['metadata']['process_proxy']['config']['authorized_users']:
                        lista_nombres = kernelspec_model['spec']['metadata']['process_proxy']['config']['authorized_users'].split(',')
                        if kernel_user in lista_nombres:
                            return kernelspec_model
    return kernelspec_model

class MainKernelSpecHandler (APIHandler):
    @web.authenticated
    @gen.coroutine
    def get(self, kernel_user):
        ksm = self.kernel_spec_manager
        km = self.kernel_manager
        model = {}
        model['default'] = km.default_kernel_name
        model['kernelspecs'] = specs = {}
        kspecs = yield maybe_future (ksm.get_all_specs ())
        self.log.info("Call recived. Searching kernels for user '%s' " %kernel_user, exc_info=True)
        for kernel_name, kernel_info in kspecs.items ():
            try:
                if is_kernelspec_model (kernel_info):
                    d = kernel_info
                else:
                    d = kernelspec_model(self, kernel_name, kernel_info['spec'], kernel_info['resource_dir'])

                d = apply_user_filter(kernel_user, d)
                if d is not None:
                    self.log.info("Find kernel '%s' for user '%s'", d['name'], kernel_user, exc_info=True)
                    specs[kernel_name] = d

            except Exception:
                self.log.error("Failed to load kernel spec: '%s'", kernel_name, exc_info=True)
                continue
        self.set_header("Content-Type", 'application/json')
        self.finish(json.dumps(model))

kernel_name_regex = r"(?P<kernel_name>[\w\.\-%]+)"
kernel_user_regex = r"(?P<kernel_user>[\w%]+)"

default_handlers_notebooks = [
            (r"/api/kernelspecs/user/%s" % kernel_user_regex, MainKernelSpecHandler),
            (r"/api/kernelspecs/%s" % kernel_name_regex, notebook_handlers.KernelSpecHandler)
]

for path, cls in default_handlers_notebooks:
    bases = (TokenAuthorizationMixin, CORSMixin, JSONErrorsMixin, cls)
    default_handlers.append((path, type(cls.__name__, bases, {})))

for path, cls in notebook_kernelspecs_resources_handlers.default_handlers:
    # Everything should have CORS and token auth
    bases = (TokenAuthorizationMixin, CORSMixin, JSONErrorsMixin, cls)
    default_handlers.append ((path, type(cls.__name__, bases, {})))
lucabem commented 4 years ago

Finally it works. I have modified a bit MainKernelSpecHandler to:

def apply_user_filter(kernel_user, kernelspec_model):
    # Comprobamos que existen los campos en el modelo de entrada y que se ha pasado kernel_user
    #   "metadata":
    #     "process_proxy":
    #       "class_name": "enterprise_gateway.services.processproxies.distributed.DistributedProcessProxy",
    #       "config":
    #         "authorized_users": "bob,alice"
    if kernel_user:
        if kernelspec_model['spec']['metadata']:
            if kernelspec_model['spec']['metadata']['process_proxy']:
                if kernelspec_model['spec']['metadata']['process_proxy']['config']:
                    if kernelspec_model['spec']['metadata']['process_proxy']['config']['authorized_users']:
                        lista_nombres = kernelspec_model['spec']['metadata']['process_proxy']['config']['authorized_users'].split(',')
                        if kernel_user in lista_nombres:
                            return kernelspec_model
    return None

class MainKernelSpecHandler (APIHandler):
    @web.authenticated
    @gen.coroutine
    def get(self, kernel_user = None):
        ksm = self.kernel_spec_manager
        km = self.kernel_manager
        model = {}
        model['default'] = km.default_kernel_name
        model['kernelspecs'] = specs = {}
        kspecs = yield maybe_future (ksm.get_all_specs ())
        self.log.info("Call recived. Searching kernels for user '%s' " %kernel_user, exc_info=True)
        for kernel_name, kernel_info in kspecs.items ():
            try:
                if is_kernelspec_model (kernel_info):
                    d = kernel_info
                else:
                    d = kernelspec_model(self, kernel_name, kernel_info['spec'], kernel_info['resource_dir'])

                if kernel_user:
                    d = apply_user_filter(kernel_user, d)
                    if d is not None:
                        self.log.info("Find kernel '%s' for user '%s'", d['name'], kernel_user, exc_info=True)
                        specs[kernel_name] = d
                else:
                    self.log.info("All kernels given. no user detected")
                    specs[kernel_name] = d

            except Exception:
                self.log.error("Failed to load kernel spec: '%s'", kernel_name, exc_info=True)
                continue
        self.set_header("Content-Type", 'application/json')
        self.finish(json.dumps(model))

kernel_name_regex = r"(?P<kernel_name>[\w\.\-%]+)"
kernel_user_regex = r"(?P<kernel_user>[\w%]+)"

default_handlers_notebooks = [
            (r"/api/kernelspecs", MainKernelSpecHandler),
            (r"/api/kernelspecs/user/%s" % kernel_user_regex, MainKernelSpecHandler),
            (r"/api/kernelspecs/%s" % kernel_name_regex, notebook_handlers.KernelSpecHandler)
]
kevin-bates commented 4 years ago

Great - congratulations! Yeah, adding the primary endpoint back in probably helps.

A few comments...

lucabem commented 4 years ago

How are you deciding when to hit the /api/kernelspecs/user endpoint? If based on the presence of USER, I would recommend that be changed to look for KERNEL_USERNAME instead since that's what get's conveyed when starting a kernel (and is used to check authorization on kernel start requests).

I have modified notebook package to fill the request with os.getEnv('USER'). The problem is that all kernels have the same KERNEL_USERNAME (jovyan). For example, if I want name a kernel-pod with 'test', how could i use Kernel_username?

kevin-bates commented 4 years ago

If you're setting up authorized-users lists then you'd have to add 'jovyan' to every user list - otherwise no kernel can be started. If KERNEL_USERNAME is not provided in the kernel start request, the user that EG is running as is used. In the container case, EG runs as 'jovyan'.

The entire intention of KERNEL_USERNAME is to tie a user to the EG request. We document how to do this in Hub environments. However, generally speaking, users/admins would set KERNEL_USERNAME in their environment prior to launching Notebook, then it will flow into EG.

kernel-pod names are <KERNEL_USERNAME>-<kernel-id>. This allows admins monitoring the cluster to immediately determine which user is running which pods and, within users, which pod is running which kernel. Not setting KERNEL_USERNAME (and letting it default to the EG user) defeats this purpose and all pods would then be named jovyan-<kernel-id> and if one pod is consuming all resources, the admin won't know who to contact about that.

Btw, KERNEL_USERNAME, along with any env prefixed with KERNEL_, automatically flows from Notebook in POST /api/kernels requests. This is essentially how we perform very basic "parameterization".

lucabem commented 4 years ago

If I set KERNEL_USERNAME on kernel.json stanza (env), when we spawn a notebook it will have different name from 'jovyan' (it will set this env on kernel-pod.yaml.j2)

kevin-bates commented 4 years ago

This also defeats the purpose of KERNEL_USERNAME and you won't be using authorized_users correctly during startup requests because its value is not set into the environment until the kernel is launched and the authorization checks have already been performed. As a result, authorization will still be based on the default value of 'jovyan' (and not the requesting user).

The kernel.json env stanza is for variables that are destined for the kernel's runtime environment. Sure, one could say that KERNEL_USERNAME is appropriate in the kernel's env, but its real utility is that it be used during the kernel's launch (not it's runtime) and reflect the identity of the user issuing the request.

What about replacing this code with something like this:

os.environ['KERNEL_USERNAME'] = os.getenv('KERNEL_USERNAME', GatewayClient.instance().http_user or os.getenv('USER'))

which says, use env KERNEL_USERNAME if present, else use the user set by the config option for HTTP_USER if present, else use env USER. This preserves current functionality and leverages your USER stuff. In fact, I think you should use this approach when calling the /api/kernelspecs/user/ endpoint, otherwise we have mixed behaviors and the set of kernelspecs returned should match the set of kernels the user is allowed to launch.

I suppose you could end up with a None value if nothing is set and I'm not sure if that becomes an issue. If it does, you could pop() KERNEL_USERNAME from the env if it's None.

lucabem commented 4 years ago

Hi @kevin-bates

I will close this issue. But one last question:

How do u test jeg? I mean, i have a k8s cluster, but each time i need yo generate a whl, build an image and pull it to hub and then pull it on my cluster

kevin-bates commented 4 years ago

Hi @lucabem - I'm sorry, I thought for sure I had replied to your question, but now realize I must have gotten distracted and never got back to sending my response!

Testing is difficult! We have some unit tests and we have a Docker-based integration test that runs a barely viable YARN server. But you're absolutely correct, development/testing on k8s is tedious. Do you know of any tools/packages that might make that kind of thing easier?

lucabem commented 4 years ago

Neithe @kevin-bates , I just was building dockerfile and playing with logs.

One question, Im building JEG on a EKS-cluster into Amazon Web Services. When we deploy JEG, on yaml we have the option of use to use as external-ip of the service.

But in aws we dont have an master-node. Do you know how to solve it? I have use an IP of a worker node but if that node goes down, it will be unreachable

kevin-bates commented 4 years ago

Do you know how frequently worker nodes go down? Are they brought down after a certain amount of time or are you implying there was some failure that stopped the worker node?

The "plan" is to implement HA/DR. We have some aspects of DR in place already provided the file in which kernel persistence is stored is on a shared file system such that when a new EG instance starts, it loads the persisted kernel information and reconnects to those kernels originally running on behalf of the previous EG.

Where we really want to be is here where we can have multiple EG instances running and, should one EG instance go down, the kernel interaction would be sent to another EG instance that detects it doesn't know about that kernel and then loads its state from the persistence location. (We also want to implement other forms of kernel state persistence besides just files - like NoSQL databases, etc.)

I know this "vaporware" isn't helpful right now, but that's the plan.

lucabem commented 4 years ago

I was thinking on change the jeg-service from NodePort to LoadBalacer, or using NGIX.

kevin-bates commented 4 years ago

The service type listed in the YAML or helm chart templates is a suggested value, but the idea is that things will vary with deployment and configuration. Feel free to adjust things to what works best for you.

I see that we don't treat the NodePort value as a variable in the helm chart - and we should probably do that. If you use the helm charts and have been able to modify the service type, would you mind making the changes necessary to change the service type (not sure if other values need to change as well)?

If you deploy using the YAML file, it would be great to know what items did change and we could figure out how best to make the helm charts varied in this aspect.

Thanks.

lucabem commented 4 years ago

Hi @kevin-bates!

I have rigth now free time to add this changes (filter kernels shown by user query params). How could we work? Or face that pull request.

kevin-bates commented 4 years ago

Hi @lucabem - thanks for checking back in - I hope you are well and your degree is finished - congratulations!

Why don't we start with the EG-side changes first by posting a PR? From there we can work on solidifying the API.

I'm going to re-open this issue since it has a rich history and we can associate it with the PR.

lucabem commented 4 years ago

@kevin-bates - Perfect! During this week i will try to sum up this issue and focus on what behaviour we are expecting.

On question, how is your env for working? I mean, do you code on Unix or Windows?

kevin-bates commented 4 years ago

Sounds good!

On question, how is your env for working? I mean, do you code on Unix or Windows?

My primary development is on a mac, but I spend a lot of time running EG on a small (3 node) RedHat Linux cluster that runs Hadoop YARN and Spark which I hit from a Notebook server running on my mac.

I'm not sure how well (or even if) EG would run on a Windows server. I don't think we do things that would prevent that, but it's not a focus for us.

lucabem commented 4 years ago

Hi @kevin-bates - Thats the sum up of this issue.

Current behavior Jupyter Enterprise Gateway gives you all kernels avaible on directory /usr/local/share/jupyter/kernels. Theres is a field onmetadata.process_proxy.config. authorized_users json that allows us to know who is allowed or not to spawn a kernel. By default, JEG doesnt check that list, so we want to modify JEG to use it.

{ 
   "default":"python3",
   "kernelspecs":{ 
      "test":{ 
            .....
            "metadata":{ 
               "process_proxy":{ 
                  "class_name":"enterprise_gateway.services.processproxies.k8s.KubernetesProcessProxy",
                  "config":{ 
                     "image_name":"lucabem/jeg-env-base:pre",
                     "authorized_users":[ 
                        "pedro",
                        "luis"
                     ] }}}
            ....
}

Kernelspecs handlers are inherited from notebook and they dont allow us to pass a query param to the call /api/kernelspecs. We need to modify this method to allow query params as user_name, but it should be optional if we want to show all kernels.

We have two options to modify /api/kernelspecs:

  1. Add new path to the API as /api/kernelspecs/user/{user}.
  2. Allow optional params to he API as /api/kernelspecs?user={user}

In both cases we need a method to filter kernels by user. On method notebook.MainKernelSpecHandler.get which returns kernels avaible, we call function filter.

Filter could be like that (it should be cleaned):

def apply_user_filter(kernel_user=None, kernelspec_model):
    if kernel_user:
        if kernelspec_model['spec']['metadata']:
            if kernelspec_model['spec']['metadata']['process_proxy']:
                if kernelspec_model['spec']['metadata']['process_proxy']['config']:
                    if kernelspec_model['spec']['metadata']['process_proxy']['config']['authorized_users']:
                        lista_nombres = kernelspec_model['spec']['metadata']['process_proxy']['config']['authorized_users'].split(',')
                        if kernel_user in lista_nombres:
                            return kernelspec_model
    return None
kevin-bates commented 4 years ago

We'll also need to determine how a given client knows whether or not it should request user-filtered kernelspecs. I suspect this is going to have to be a configurable option.

I think the default needs to be to return the model and only if kernel_user is not None AND they are explicitly denied or they do not appear in the non-empty list of authorized users would None be returned.

lucabem commented 4 years ago

Yes - it should be on multi-user enviroments as JupyterHub, where we want to give only kernels of one user.

In my opinion it could be a booleanr where TRUE will filter kernels and FALSE wont.

I think the default needs to be to return the model and only if kernel_user is not None AND they are explicitly denied or they do not appear in the non-empty list of authorized users would None be returned.

Yes, and if they are both, I would use the more restrictive option.

kevin-bates commented 4 years ago

Yes, and if they are both, I would use the more restrictive option.

Correct - deny always supersedes allow.

Ok. That seems fine.

What if we were to make the value of the "flag" be the name of the environment variable on which to filter kernelspecs and base KERNEL_USERNAME? If set to None (default) or the empty string ('') or the named variable doesn't exist, then the default request is performed. (The latter condition should probably log a warning.)

Examples: --GatewayClient.kernel_user_filter_env=USER - uses env USER for the kernelspecs and KERNEL_USERNAME --GatewayClient.kernel_user_filter_env=JUPYTERHUB_USER - uses env JUPYTERHUB_USER for the kernelspecs and KERNEL_USERNAME --GatewayClient.kernel_user_filter_env=KERNEL_USERNAME - uses env KERNEL_USERNAME for the kernelspecs and KERNEL_USERNAME

I think we're going to find applications and platforms where the user identity varies and this would allow greater flexibility.

lucabem commented 4 years ago

It sounds great!

With that kind of configurable option we avoid having to add modify the api path kernelspecs

In this way, the call would still be the same unless the time we check if that parameter exists.

lucabem commented 4 years ago

Ok, summing up new advances we have reached that our new MainKernelSpecHandler will have a header such this: def get(self, kernel_user=None) Kernel_user is an optional param and it will be given by a resquest made by notebook on notebook-server launch (if we use JupyterHub).

If we dont recive anything --> we return the whole kernelspec directory. If we recive it --> we search for authorized an unauthorized user lists.

I will try to start coding it this week. I think we will need to modify /api/kernelspecs to allow query params as /api/kernelspecs?user={user}

kevin-bates commented 4 years ago

Correct.

If we recive it --> we search for authorized an unauthorized user lists.

Check unauthorized first. If found, return None. Also, if not found in the authorized list, BUT the list is present (and non-empty), then None must also be returned. (An empty authorized_users list behaves as if one is not present in the first place.)

Hmm - I just remembered that the {un}authorized_users lists can also be specified at a global level (to span kernelspecs). As a result, we need to consider combining the two similar to what we do when launching the kernel.

I think we will need to modify /api/kernelspecs to allow query params as /api/kernelspecs?user={user}

We need to ensure that /api/kernelspecs (with no parameter) continues to work as well.

lucabem commented 4 years ago
CASE AUTHORIZED UNAUTHORIZED RETURN
A 0 0 None
B 0 1 None
C 1 0 Give it
D 1 1 None
F - - Give it

Case A: kernel_user doesnt appear in any list. We return None

Case B: kernel_user appears on unauthorized list. We return None

Case C: kernel_user appears on authorized list. We return the kernel_spec

Case D: we apply more restrictive list. We return None

Case F: No lists appear. We give all kernel_specs

kevin-bates commented 4 years ago

Yes. Although to clarify Case A: kernel_user doesnt appear in any non-empty list. We return None

Case F will be the default case since most folks don't leverage this functionality.

lucabem commented 4 years ago

Nice, I have coded that filter:

def apply_user_filter(kernelspec_model, kernel_user = None):
  if kernel_user:
    # Check unauthorized list
    if exists(kernelspec_model, ['spec', 'metadata', 'process_proxy', 'config', 'unauthorized_users']):
      # Check if kernel_user in kernelspec_model
      unauthorized_list = kernelspec_model['spec']['metadata']['process_proxy']['config']['unauthorized_users']
      if kernel_user in unauthorized_list:
        return None
    else:
      # Now we have check that unauthorized list doesnt exist
      if exists(kernelspec_model, ['spec', 'metadata', 'process_proxy', 'config', 'authorized_users']):
        authorized_list = kernelspec_model['spec']['metadata']['process_proxy']['config']['authorized_users']
        if authorized_list and kernel_user not in authorized_list:
          return None
  return kernelspec_model

The exists function:

def exists(obj, chain):
    _key = chain.pop(0)
    if _key in obj:
        return exists(obj[_key], chain) if chain else obj[_key]

We need to ensure that /api/kernelspecs (with no parameter) continues to work as well.

We could assign both paths to the same handler. We just need to check if kernel_user param is None or not.

if kernel_user is not  None:
     # we apply filter
else:
     # we return all kernel_model
lucabem commented 4 years ago

The main problem will be the second call of /api/kernelspecs. I mean, when we open a notebook-server, it calls /api/kernelspecs to get available kernels.

but when we select one kernel, it makes another call to /aoi/kernelspecs. im trying to know where and who is calling this second time.

Could you help me with that?

kevin-bates commented 4 years ago

Hmm. All "requests" for kernelspecs in a Notebook server configured to hit a Gateway, should go through the Gateway classes that reside in Notebook, specifically, list_kernel_specs(). As a result, I suspect the interrogation of the I-want-user-filtering option, and its outcome on the endpoint, would take place in _get_kernelspecs_endpoint_url().

Btw, the else: statement in your if kernel_user: block should be removed since we still need to check the authorized_users list even when kernel_user is not in the unauthorized_users list (then adjust the indentation of what follows).

Since we'll need Enterprise Gateway updated before Notebook, please go ahead and submit a PR to EG and we can work off that. Thanks.

We need to also ensure that if user-filtering is configured on the gateway client (in Notebook), and its pointing at an older EG (that doesn't know about this), the requests still work or there's a clear indication that we can use to post a message that user-filtering is not available.