ideonate / cdsdashboards

JupyterHub extension for ContainDS Dashboards
https://cdsdashboards.readthedocs.io/
Other
200 stars 38 forks source link

Obtain user info for fastapi-served dashboard #55

Closed ricky-lim closed 3 years ago

ricky-lim commented 3 years ago

Hi,

I'm experimenting with fastapi (https://fastapi.tiangolo.com/) as a dashboard served with a uvicorn, and it works as expected.

I'm curious if we could use cdsdashboards for authentication in this case.

I was wondering what would be the best approach with cdsdashboards, to obtain the user-info, once authenticated, programmatically ?

I was trying from /hub/dashboards-api/hub-info/user, unfortunately it gives error during redirection on the browser. INFO:tornado.application:b'INFO: 10.42.0.166:0 - "GET /user/ril/dash-biotools-api/whoami HTTP/1.1" 307 Temporary Redirect\n' user-fastapi

Below is my current setup.

# uvicorn to serve the main.py
c.VariableMixin.extra_presentation_launchers = {
        'uvicorn': {
                'cmd': ['start.sh', 'python3', '-m', 'jhsingle_native_proxy.main', '--logs', '--debug'],
                'args': ['--destport=0', 'python3',
                         '{presentation_path}',
                         '{--}root-path={base_url}',
                         '{--}port={port}',
                ],
        },
}
# main.py

import typer
import uvicorn
from fastapi import FastAPI
from starlette.responses import RedirectResponse

api = FastAPI()

@api.get('/foo')
def foo():
    return 'bar'

@api.get('/whoami')
async def get_user():
    return RedirectResponse('/hub/dashboards-api/hub-info/user')

def main(root_path: str = typer.Option(...), port: int = typer.Option(...)):
    uvicorn.run(api, host='0.0.0.0', root_path=root_path, port=port, access_log=True)

if __name__ == '__main__':
    typer.run(main)

Describe the solution you'd like A user json, such as {"kind": "user", "name": "ril"}

Describe alternatives you've considered

Configuration Using Zero to jupyterhub

Thank you in advance and cheers

danlester commented 3 years ago

This looks interesting, and I'd like to try it out.

However, there isn't much point in the /whoami end point if it simply redirects to /hub/dashboards-api/hub-info/user - you might as well just type that destination URL into the browser in the first place. Maybe I've misunderstood what you were trying to do with that endpoint.

You would need to call the hub-info endpoint either:

  1. From Javascript within an HTML page served by fastapi, or
  2. On the server side in a fastapi handler, but ensuring you pass along any relevant cookies submitted by the browser (well, in this case perhaps you'd call other JupyterHub APIs directly anyway)

To better understand what you're trying to do, would it be possible to flesh out a @api.get('/') endpoint that returns a proof-of-concept HTML page for your dashboard (without worrying about the user for now)? Then we can think about how to get the user in that context.

Compared to Voila and Streamlit, you should have much more flexibility in fastapi since you are creating at the web server level, but the question is how it fits into your dashboard workflow.

ricky-lim commented 3 years ago

Hi Dan,

Thank you for the interest. Yes, exactly right with your two points.

In the first point, for example hitting the fastapi docs at /docs would yield the expected response. For PoC with HTML page, it does also work with a SPA-html built with vuejs (for example) and then served with fastapi.

However, without the front-end part, only fastapi backend code, I'm still yet struggling, to get the logged-in user info. Example use case is an API POST that requires user_name and user_email, to send email with fastapi background tasks.

From the code, I would like to make a API-call (as a service) to find who is the logged-in user, without the dashboard developer needs to install jupyterhub dependency and/or create a static HTML.

I think the ideal scenario, is to use the jupyterhub authentication as with the flask-login package. There is a port for fastapi, https://pypi.org/project/fastapi-login.

I thought about using the access_token from jupyterhub and then set it as cookie with secret managed by fastapi. What is your advice ?

Cheers

danlester commented 3 years ago

I still think some front-end code would be useful here, even if the point is that the overall application isn't working, so that we can discuss possible solutions.

It doesn't make sense to me to talk about finding the logged-in user with "only backend" code. There needs to be some browser-side context to this I feel.

Anyway, to understand what is available to you in a call to a fastapi handler, maybe just try outputting all cookies and then accessing it in a new incognito window (only logging in to JupyterHub). I don't think this needs to be overcomplicated by also creating a fastapi-based secret/login system necessarily. But the first step would be to see if we can get the user from JupyterHub, and then decide if it is important to also 'cache' this within fastapi somehow.

ricky-lim commented 3 years ago

I tried with jupyterhub REST API: https://jupyterhub.readthedocs.io/en/stable/_static/rest-api/index.html#path--authorizations-cookie--cookie_name---cookie_value- Unfortunately, still not able to get the user info.

Is there a simple way to pass the Logged-in user from jhsingle_native_proxy to fastapi-served app?

Cheers

ricky-lim commented 3 years ago

From the incognito-browser, there are 3 cookies set:

Below the services that I tried:

import os
from jupyterhub.services.auth import HubAuth

auth = HubAuth(api_token=os.environ['JUPYTERHUB_API_TOKEN'], cache_max_age=60, cookie_name='jupyterhub-user-ril-dash-whoami-flask')

# Cookie of `jupyterhub-user-ril-dash-whoami-flask`
c = 2xxxxxxxxxx:jupyterhub-user-ril-dash-whoami-flask|xxxxxxxxxx

# Unfortunately did not yet work
auth.user_for_cookie(c)
# No Hub user identified for request

Update: with jupyterhub-hub-login, the auth.user_for_cookie does work :)

auth = HubAuth(api_token=os.environ['JUPYTERHUB_API_TOKEN'], cache_max_age=1, cookie_name='jupyterhub-hub-login')
# c_x is the cookies value of jupyterhub-hub-login
auth.user_for_cookie(c_x, use_cache=False)
{'kind': 'user',
 'name': 'ril',
 'server': '/user/ril/',
 'pending': None,
 'created': '2021-01-18T12:24:26.887723Z',
 'last_activity': '2021-02-10T14:38:17.770000Z',
 'servers': None}

However, from the request object using flask, unfortunately I could only receive jupyterhub-session-id and jupyterhub-user-ril-dash-whoami-flask. The redirection to the auth.login_url yields errors. Any advice how could we set the cookie for jupyterhub-hub-login?

https://github.com/jupyterhub/jupyterhub/blob/master/examples/service-whoami-flask/whoami-flask.py

Thanks in advance.

ricky-lim commented 3 years ago

From this readme, https://jupyterhub.readthedocs.io/en/stable/reference/services.html#hub-authentication-and-services. "When a user logs into JupyterHub, the Hub sets a cookie (jupyterhub-services). The service can use this cookie to authenticate requests.".

I was wondering if this is set upon user-login, since I could not find such cookie (jupyterhub-services). Any clue?

danlester commented 3 years ago

OK I think I'll have to set this up for myself and see what pops up. The problem is most likely that jhsingle-native-proxy isn't forwarding all cookies or headers - I can't remember so will take a look. It should be possible to add extra headers/cookies if needed, although we'll need to think about how you can be sure they are secure.

If you have anything that will help set this up, please let me know - e.g. Docker image containing fastapi etc.

ricky-lim commented 3 years ago

Hi Dan,

I used conda environment to install fastapi and uvicorn with pip. For flask, test I use its built-in development server with pip install as well.

python -m pip install fastapi uvicorn[standard]

I'm not sure which is best to forward either the encrypted cookies or authentication token.

Either way I think, it is required for the dashboard developer to retrieve the user info via jupyterhub REST api

For example with this snippet, from jupyterhub doc with api call

r = requests.get(
    '/'.join(["http://127.0.0.1:8081/hub/api",
               "authorizations/cookie/jupyterhub-services",
               quote(encrypted_cookie, safe=''),
    ]),
    headers = {
        'Authorization' : 'token %s' % api_token,
    },
)
r.raise_for_status()
user = r.json()

Having set the DEBUG on the proxy, when I checked with kubectl log, it also prints to the console, the logged in user with its properties, when proxy authenticated with oauth.

I agree that it needs to be safe to pass cookies. As I understood the cookies are encrypted to prevent user tampering. But I might miss some security concerns.

I am curious What are your concerns that we should be aware of?

danlester commented 3 years ago

I'm having trouble installing uvicorn in various environments... please could you just share your 'singleuser' Dockerfile so we're on exactly the same page?

ricky-lim commented 3 years ago

Ya sure, I'm using z2h setup.

here is my singleuser-dockerfile

ARG TAG=1.2
ARG BASE_REPO=jupyterhub/singleuser

FROM $BASE_REPO:$TAG

USER root
RUN apt-get -y update && apt-get install -y git vim build-essential
RUN pip install jhsingle-native-proxy==0.6.1

# Set conda environment
RUN conda init bash
USER root
RUN mkdir -p /etc/conda && cp conda-env/.condarc /etc/conda/.condarc && \
    cp conda-env/conda-init.sh /etc/profile.d/conda-init.sh

# Fix permissions on /etc/jupyter as root
USER root
RUN fix-permissions /etc/jupyter/

USER $NB_UID

On the hub, I added this setting for the presentation launcher:


      c.VariableMixin.extra_presentation_launchers = {
          'main': {
                      'cmd': ['start.sh', 'python3', '-m', 'jhsingle_native_proxy.main', '--logs', '--debug'],
                      'args': ['--destport=0', 'python3',
                               '{presentation_path}',
                               '{--}root-path={base_url}',
                               '{--}port={port}',
                      ],
          },
      }

For the presentation_path pointed to the main.py it could be either flask or fastapi. For flask root-path is not required but for fastapi is required for the uvicorn.

Here is my main.py for flask, I think if flask works porting to fastapi for session-based authentication with request object should also work.

#!/usr/bin/env python3
"""
whoami service authentication with the Hub
"""
import requests
import logging
import json
import os
from functools import wraps
from urllib.parse import quote

import typer
from flask import Flask
from flask import Response
from flask import redirect
from flask import request
from jupyterhub.services.auth import HubAuth

log = logging.getLogger('werkzeug')
log.setLevel(logging.DEBUG)

# cookie_name, default to jupyter-services
auth = HubAuth(api_token=os.environ['JUPYTERHUB_API_TOKEN'], cache_max_age=60, cookie_name='jupyterhub-user-ril-dash-whoami-flask')

app = Flask(__name__)

def authenticated(f):
    """Decorator for authenticating with the Hub"""

    @wraps(f)
    def decorated(*args, **kwargs):
        cookie = request.cookies.get(auth.cookie_name)
        log.debug(f'cookies: {request.cookies}')
        log.debug(f'cookie: {cookie}')
        if cookie:
            user = auth.user_for_cookie(cookie, use_cache=False)
            log.debug(f"user-cookie: {user}")
        else:
            user = None
        if user:
            return f(user, *args, **kwargs)
        else:
            # Could not redirect
            redirect_url = auth.login_url + f'?next=/hub/dashboards/whoami-flask'
            response = redirect(redirect_url)
            return response

    return decorated

# @app.route('/')
# def index():
#     return 'hello'

@app.route('/')
@authenticated
def whoami(user):
    return Response(
        json.dumps(user, indent=1, sort_keys=True), mimetype='application/json'
    )

def main(root_path: str = typer.Option(...), port: int = typer.Option(...)):
    app.root_path = root_path
    app.run(host='0.0.0.0', port=port, debug=True)

if __name__ == '__main__':
    typer.run(main)

For the conda, environment.yaml:

name: test-api-env
channels:
  - conda-forge
  - defaults
dependencies:
  - pip=21.0
  - python=3.9.1
  - pip:
    - fastapi==0.63.0
    - flask==1.1.2
    - jupyterhub==1.3.0
    - requests==2.25.1
    - typer==0.3.2
    - uvicorn==0.13.3
prefix: /home/ril/.conda/envs/test-api-env

I was wondering upon proxying by if jh-single proxy, could you check if jupyter-services cookie_name is being set, please?

Cheers

danlester commented 3 years ago

Thank you.

With docker build I get:

Step 9/12 : RUN mkdir -p /etc/conda && cp conda-env/.condarc /etc/conda/.condarc &&     cp conda-env/conda-init.sh /etc/profile.d/conda-init.sh
 ---> Running in f173448915d1
cp: cannot stat 'conda-env/.condarc': No such file or directory

I know we've spoken about conda envs before. I presume we'll need a typical .condarc file that we copy into our docker build. Likewise for conda-init.sh. These are probably similar to the ones you and I have used before.

Similarly, at what point do you actually make use of your environment.yaml?

Sorry for dragging this out, but I'd still like to replicate the fastapi and flask components - although I appreciate we could really talk about your questions at a higher level in terms of cookies etc. (Yes, it might be possible for jhsingle-native-proxy to intervene and send through some extra headers.)

ricky-lim commented 3 years ago

Sorry forgot to mention that. Yes, correct that was the same.

I created conda environment within my singleuser-server, i.e, jupyter-ril and created the dashboard server using conda-env.

Hope this may clarify :)

Yes sure. As a developer of a dashboard, I think the use of jupyter authentication service means only if the userinfo could be retrieved, hopefully in a developer friendly manner :)

danlester commented 3 years ago

But then I don't see how that Dockerfile is the correct one since there is no point where it copies those files to the docker build image (i.e. I would expect to see COPY .condarc /etc/conda/.condarc or similar).

I think this will be easier to talk about if we just try to install everything into the initial docker image instead of manually installing conda environment from a Jupyter server.

Anyway, will see where I get to!

ricky-lim commented 3 years ago

You are correct. sorry I did not provide .condarc and conda-init.sh files

Another option is you could also copy the environment.yaml and RUN conda env update -n base environment.yaml this should install the dependencies in the base environment.

I apologize for the installation hassle.

danlester commented 3 years ago

I haven't tried this in detail in the dockerfile examples we've discussed, but I have added a new experimental option to jhsingle-native-proxy 0.7.0.

The comand-line arg --forward-user-info will include a header X-CDSDASHBOARDS-JH-USER in the http request proxied to your underlying service. This contains user name, groups, admin fields if available from JupyterHub.

Since jhsingle-native-proxy has to check the JH logged-in cookies, it doesn't make sense for your underlying process to have to do the same if it can be avoided.

I've been using an extra_presentation_launchers entry much like the one you suggested, but also adding the new argument (which defaults to off).

        'cmd': ['start.sh', 'python3', '-m', 'jhsingle_native_proxy.main'],
        'args': ['python3', '{presentation_path}',
                    '{--}root-path={base_url}',
                    '{--}port={port}',
                    '--forward-user-info'
        ]

You might also need to set a PYTHONPATH env depending on your setup.

My dashboard script is:

import typer
import uvicorn
from fastapi import FastAPI, Request

api = FastAPI()

@api.get('/')
def homepage(request: Request):
    return 'User: '+ request.headers.get('X-CDSDASHBOARDS-JH-USER', '')

def main(root_path: str = typer.Option(...), port: int = typer.Option(...), debug: bool = False):
    uvicorn.run(api, host='0.0.0.0', root_path=root_path, port=port, access_log=True)

if __name__ == '__main__':
    typer.run(main)

Of course you have to be sure that your process can't be accessed from outside the server or someone could spoof the header. But that's the case anyway - if you're going to protect your dashboard using JupyterHub, you don't want anyone to be able to reach it directly and bypass the ContainDS Dashboards auth system. I guess the important thing is to understand that the header value is only secure if your server is, whereas for normal dashboards you might not be too worried if the wrong user accesses it. If you are relying on the header to protect information belonging to a user, the server security may become more important for you, especially if you don't trust colleagues not to spoof each other!

Please let me know if you can get this working. For any further questions, let's try to base them on the simplified docker images we were covering in more recent comments above.

I'll close for now but feel-free to reopen.

ricky-lim commented 3 years ago

Hi Dan,

Thank you for the update.

Yes, I completely agree with you, on the simplified docker images.

For reproducing the case, I setup a github repository for further questions: https://github.com/ricky-lim/jupyterhub-minikube/tree/fastapi

On top of that, I have tested with version 0.7.0 and it works as expected.

I think that is very straightforward, for the dashboard developer to use the userinfo.

Cheers

danlester commented 3 years ago

Great to hear that worked.

Thank you also for the GitHub repo - it could be very useful going forward.

ricky-lim commented 3 years ago

Hi Dan,

Would it also be possible for jhsingle-native-proxy to provide user's email in addition to the Users's model attributes?

danlester commented 3 years ago

I don't think email is a standard field on the User model in JupyterHub, but you might have it somewhere depending on your Authenticator setup.

Please could you remind me how you have that set, and if you have seen email anywhere as a result?

In your example above, it isn't readily available:

auth = HubAuth(api_token=os.environ['JUPYTERHUB_API_TOKEN'], cache_max_age=1, cookie_name='jupyterhub-hub-login')
# c_x is the cookies value of jupyterhub-hub-login
auth.user_for_cookie(c_x, use_cache=False)
{'kind': 'user',
 'name': 'ril',
 'server': '/user/ril/',
 'pending': None,
 'created': '2021-01-18T12:24:26.887723Z',
 'last_activity': '2021-02-10T14:38:17.770000Z',
 'servers': None}
ricky-lim commented 3 years ago

Hi Dan,

Correct, the email is stored in auth_state.

What do you think the best approach to retrieve it?

Cheers

danlester commented 3 years ago

Maybe there needs to be a way to tell jhsingle-native-proxy which fields to pass through in the user info header/query, and that should be able to fetch out auth_state.email explicitly (without the whole of auth_state).

I'll take a look when I can.

danlester commented 3 years ago

I've looked into this and unfortunately the JupyterHub REST API doesn't return auth state for the user, and it can't be configured to do so.

If you're only looking for the email address, I think the best workaround may be to ensure that each user is named according to their email address - either use their full email address as their username, or the first part if you believe everyone should be from the same domain.

This all depends on your authenticator to get the username right.

Hope this helps...