Custom cache for oauth2 tokens

JeevansSP commented 1 year ago

Hello, I have a flask app, which uses trino python client to query the data from my trino servers and I have enabled oatuh2 authentication against my azure active directory, currently, the token is caching per host, I need to have it cached per user who authenticates on the front end of my web application. I have implemented my custom cache below and it is working as per my use case, but I prefer not to change any private properties as it's not the best practice.

from  flask import session

class _CustomCache(_OAuth2TokenCache):
    """
    In-memory token cache implementation. The token is stored per user.
    """

    def __init__(self):
        self._cache = {}
    def get_token_from_cache(self, host: str) -> Optional[str]:
        userName=session['user']   
        return self._cache.get(userName)

    def store_token_to_cache(self, host: str, token: str) -> None:
        userName=session['user']  
        self._cache[userName] = token

temp=OAuth2Authentication()
temp._bearer._token_cache=_CustomCache()

conn=connect(

    host='******',
    port=443,
    auth=temp,
    http_scheme="https"  
)

cursor=conn.cursor()

hashhar commented 1 year ago

This seems to be side-effect of the fact that we cache tokens for the entire host instead of per-connection. i.e. once we stop sharing tokens across across connections this problem can be solved much easier by providing two types of caches:

MEMORY (i'd call it NONE tbh): here the token can be stored as property of the connection and re-used as long as it's valid with the same connection
MEMORY_SHARED (I'd call it just MEMORY tbh): here the token needs to be cached per-host and we can introduce a read-lock for the cached value so that if multiple concurrent authentications happen then only the first successful auth's token is used and others waiting on the read-lock reuse that instead of doing their own auth flow

that way for applications where per-user caching is needed you can create connections per user (also has the benefit o keeping things like session properties and additional config separate) with MEMORY caching mode and cache the connection object itself in your application per-user.

For places where it doesn't matter which user identity is used the MEMORY_SHARED cache can be used.

cc: @lukasz-walkiewicz @s2lomon for your ideas from when working on the OAuth2 impl in the JDBC driver.

mdesmet commented 1 year ago

Both options are actually already implemented.

If you install keyring, the token is cached over connections but indeed per host, serving the use case of CLI applications like dbt (what you call MEMORY)

The default is per instance of trino.auth.OAuth2Authentication, as long as you don't share it over connections, it would be per connection.

the only gap is that we currently don't expose the token. Note that also the token can change. Initially the token won't be there until the authentication is completed. So what we actually need is rather a subscription.

I think in the end it will come close to what the PR is providing a hook to provide a token, and process a token change.

mdesmet commented 1 year ago

Relevant related PR in Superset: https://github.com/apache/superset/issues/20300

mdesmet commented 1 year ago

cc: @leniartek

trinodb / trino-python-client

Custom cache for oauth2 tokens #223