Open parikls opened 2 months ago
Hi @parikls, thank you so much for reporting this, I was foolishly convinced this was resolved by introducing the context manager (you should be using with provider:
in your case), but I never honestly thought about requests happening asynchronously like this. I am currently AFK, but tomorrow I'll fix it and finally make the sso class state-less, which was my will all along. Thanks!
@tomasvotava the context manager won't help in this case unfortunately. Currently the workaround which works - is to create a new provider
for each new request. Luckily I don't see a big difference in a performance/mem usage of our API containers after introducing this approach (it's live for 2 days already). But definitely would be better to solve this properly, instead of creating a new instances of provider, oauthlib classes, etc.
def create_my_provider():
provider = create_provider(
name=...,
discovery_document=...,
response_convertor=...
)
return provider(
client_id=...,
client_secret=...,
redirect_uri=...,
scope=...,
)
async def my_view(sso_provider: Depends(create_my_provider)):
...
Thanks in advance!
P.S. I think I can confirm that the above approach solved the issue completely. At least we had the same amount of requests during the weekend, and 0 errors in a sentry.
@tomasvotava UPD: faced the same issue yesterday even with the provider creation per request. Probably some more shared state exist somewhere? Maybe on a class level? I haven't yet investigated this deeply, but want to keep you updated
@parikls thanks a lot for keeping me up to date, I am afraid I cannot post a fix without changing the behavior of how access_token is retrieved (it cannot be on the instance if the instance is to be reusable), and so I will have to make some changes.
If you are using the generic sso provider, it should always create a whole new class when you call create_provider
, it seems weird to me that there should be any shared state and if there is, this bug actually goes even deeper :fearful:
I double-checked the static attributes set on BaseSSO and the oauth client really gets recreated for each instance, how exactly do you catch this bug in a sentry?
@tomasvotava my SSO provider has a /me
endpoint which allows me to get the user info, e.g. email, for the provided access token. So I'm using the access token to fetch the user email, and comparing that email with the OpenID.email
(returned from verify_and_process
). Simplified code looks like this:
sso_user: OpenID = await verify_and_process(
request,
redirect_uri=...
)
access_token, refresh_token = my_sso.access_token, my_sso.refresh_token
me = await custom_provider_api.get_me(access_token=access_token)
if me['email'] != sso_user.email:
logger.error('... error goes here ...')
Ok, thanks for clarifying that, I was really hoping there was an error on your side, to be honest :smile: I know security isn't laughable thing, I assure you I take it seriously, I am just trying to gather as much intel as I can before moving on to solution, so that I can be sure that I actually resolve it properly this time. You are being very, very helpful, thanks!
@tomasvotava np =) Right now I'm going to utilize a simple asyncio lock, so we'll process an SSO callbacks on each container one by one, e.g.
from asyncio import Lock
LOGIN_LOCK = Lock()
async def my_view(...):
async with LOGIN_LOCK:
sso_user: OpenID = await verify_and_process(request, redirect_uri=...)
access_token, refresh_token = teachable_sso.access_token, teachable_sso.refresh_token
...
And will post an update in a day or two
@tomasvotava FYI after adding a lock - issue is fully resolved
@parikls thanks for letting me know, my planned solution does basically the same, but it will require everyone to change from using with provider:
to async with provider:
, I hope it's going to be alright for everyone, but I'd say security is more important than convenience.
My use case:
Users are logging in via an SSO, and after that I'm using the access token of the logged-in user to obtain additional information from the provider. In my view I'm getting the access token using the provider property, e.g.
provider.access_token
. But when I'm running the code on a big scale (we have hundreds of thousands users) - sometimes one user can obtain the access token of the other user, which leads to serious security issues. This happens rarely, and only if multiple users are logging-in at the same time (microseconds differences). After investigating the source codes of the fastapi-sso, I've figured out that the problem is actually with the provider implementation.SSOBase
class is stateful and aprocess_login
coro has multiple awaits after fetching the token, so when there are multiple simultaneous users are logging in via the sso - this leads to a race condition, and the first user will obtain the token of the last user.My code which I'm currently using.
lifespan.py
api.py
I've wrote a very quick and dirty test case for the fastapi-sso to prove that this is actually an issue: