jpadilla / django-rest-framework-jwt

JSON Web Token Authentication support for Django REST Framework
http://jpadilla.github.io/django-rest-framework-jwt/
MIT License
3.19k stars 649 forks source link

database query to user table on each request #350

Open eskhool opened 7 years ago

eskhool commented 7 years ago

Unless of course we are missing something obvious, we are getting the database hit for getting the request.user on each access to a view which is not using the user at all because of the deault jwt authentication class. I assume we will have to write our own base authentication class If we need the authentication but not the user object every time?

'DEFAULT_AUTHENTICATION_CLASSES': ( 'rest_framework_jwt.authentication.JSONWebTokenAuthentication'

hwh7 commented 6 years ago

I am also considering to fix this situation on my app. I don't know how I will fix. But before that, I wonder whether checking the user account again is really necessary proccess or not. Is it needed for security or other reason?

Blquinn commented 6 years ago

One of the main benefits of using JWT is to avoid hitting the DB on each request. That said, fixing this issue should be a priority, because the issue reduces the efficacy of this library for people who are using it for its performance benefits.

The solution is to load the user fields from the JWT claims and mark all the other fields as deferred (or not loaded from the database). This way, we can use the user object normally and any fields that haven't been loaded in from the DB will be loaded when accessed (only when needed).

eskhool commented 6 years ago

@Blquinn, Is there a built in way in Django to do this? I thought one could selectively load fields from the DB but what if we don't want to hit the DB at all...got a minute for some quick code (doesn't have to work)?

hwh7 commented 6 years ago

You can check out the related code authenticate_credentials() in rest_framework_jwt/authentication.py. The reason for hitting DB in this code is to verify the user's activeness. I could avoid it by writing my own authentication code like below. After applied this code, I added more code to check the acviceness of user when he log in.

Hope it helps.

settings.py:

    'DEFAULT_AUTHENTICATION_CLASSES': (
        'user_manager.MyJSONWebTokenAuthentication.MyJSONWebTokenAuthentication',
        # 'rest_framework.authentication.BasicAuthentication',
    ),

MyJSONWebTokenAuthentication.py:

from rest_framework import exceptions
from rest_framework_jwt.authentication import JSONWebTokenAuthentication
from rest_framework_jwt.settings import api_settings

jwt_get_username_from_payload = api_settings.JWT_PAYLOAD_GET_USERNAME_HANDLER

class MyJSONWebTokenAuthentication(JSONWebTokenAuthentication):

    def authenticate_credentials(self, payload):
        username = jwt_get_username_from_payload(payload)

        if not username:
            msg = ('Invalid payload.')
            raise exceptions.AuthenticationFailed(msg)

        return username
Blquinn commented 6 years ago

@hwh7 pretty much summed it up, you have to write a custom auth class for sure.

In my project, I needed to integrate JWT with an existing code base, so I wanted to get an actual user object (my AUTH_USER_MODEL) with a bunch of deferred fields as you said.

Here's my auth class. It is kinda hacky and not necessarily functional.

# -*- coding: utf-8 -*-

from __future__ import unicode_literals, print_function, absolute_import, division

from uuid import UUID
from itertools import chain

from rest_framework_jwt.authentication import JSONWebTokenAuthentication
from rest_framework_jwt.settings import api_settings
from rest_framework import exceptions
from django.contrib.auth import get_user_model
from django.utils.translation import ugettext as _

# Maps user attribute names -> payload variable names
PAYLOAD_FIELD_MAP = {
    'id': 'user_id',
    'username': 'username',
}

User = get_user_model()

class JWTAuth(JSONWebTokenAuthentication):

    def get_deferred_fields(self):
        # I got this from Django's docs on how to get all non-relational field names.
        all_fields = set(chain.from_iterable(
            (field.name, field.attname) if hasattr(field, 'attname') else (field.name,)
            for field in User._meta.get_fields()
            # For complete backwards compatibility, you may want to exclude
            # GenericForeignKey from the results.
            if not (field.many_to_one and field.related_model is None)
        ))
        payload_fields = set(PAYLOAD_FIELD_MAP.keys())
        return all_fields.difference(payload_fields)

    def create_user_from_payload(self, payload):
        try:
            kwargs = {k: payload[v] for k, v in PAYLOAD_FIELD_MAP.items()}
        except KeyError:
            msg = _('Invalid payload.')
            raise exceptions.AuthenticationFailed(msg)

        user = User(**kwargs)
        deferred_fields = self.get_deferred_fields()
        for f in deferred_fields:
            try:
                delattr(user, f)
            except AttributeError:
                pass
        # Not sure if the following is necessary/ a good idea.
        user._state.adding = False
        user._state.db = 'default'
        return user

    def authenticate_credentials(self, payload):
        """
        Returns an active user that matches the payload's user id and email.
        """
        user_id = payload.get('user_id')
        if not user_id:
            msg = _('Invalid payload.')
            raise exceptions.AuthenticationFailed(msg)
        # I use UUID, not necessary if you don't
        try:
            payload['user_id'] = UUID(user_id)
        except ValueError:
            msg = _('Invalid payload.')
            raise exceptions.AuthenticationFailed(msg)

        user = self.create_user_from_payload(payload)

        return user

I didn't end up using this in production code, because it will cause your code to do a round trip to the database, per attribute that isn't in your user model. Therefore, you would want to always call User.refresh_from_db() before accessing more than one non-relational attribute in a a given request. I didn't want to take the risk of missing any refreshes and therefore destroying my database.

In order to avoid tons of round trips I experimented with the following code that I added to my user model:

    def __init__(self, *args, **kwargs):
        super(User, self).__init__(*args, **kwargs)
        self._refreshed = False

    def __getattr__(self, item):
        if not self._refreshed and item not in PAYLOAD_FIELD_MAP.keys():
            self.refresh_from_db(fields=self.get_deferred_fields())
            self._refreshed = True

        return super(User, self).__getattr__(self, item)

This would have to effect of refreshing the whole user model when an unavailable field is accessed, eliminating the change for unwanted round trips. Unfortunately this didn't really work because I was seeing a bunch of round trips getting user fields when I logged my SQL. I do, however, think this would be a nice solution, if it could be made to work.

If I was starting a new project, I would probably use a similar approach, but not return the actual user model with deferred fields, but rather have a plaid JWTUser object that gets passed. It would then have a method to get the full user, when needed.

So, it needs work, but I think there's potential. Let me know if you decide to work on it and your thoughts.

hwh7 commented 6 years ago

Your solution using lazy loading seems clever!

Do you want to fix this issue in the level of django-rest-framework-jwt or by using custom auth class?

authenticate() should returns User object as it is defined in the django doc. If we wanna to fix in the level of django-rest-framework-jwt, I think we should follow it.

Blquinn commented 6 years ago

@hwh7 My only concern with adding it to the library would be that users need to understand that you will be doing a ton of round trips for the user's properties unless you refresh. It's pretty counter productive towards the goal of this issue if we're causing more round trips for people. That said, maybe it could be included with the library, but not used by default (as an auth class).

eskhool commented 6 years ago

Absolutely @Blquinn, it should be an optional extra but is really needed for those of us that don't want the db to be hit to check user 'activeness'.

It definitely shouldn't be the default