FieldTracker performance

furious-luke commented 6 years ago

We've found that for our use-cases the field tracker can consume a not insignificant amount of time when attached to models being queried from our APIs. It adds about 150ms to our list requests, which doesn't seem like much, but is currently the single largest time sink in the query.

We'd really like to find a way to switch off the field tracking for these instances where we know the model will be used only for reads. The problem is that we can't come up with a clean way of passing this information into the queryset's model instantiation. Just wondering if you've given any thought to allowing the field tracker to be switched off for certain queries?

drozdowsky commented 5 years ago

I have created https://github.com/drozdowsky/django-tracking-model to solve this

gmuj commented 3 years ago

Did anyone found a way to disable TrackerField when doing a read operation via api? From the implementation, I can see a post_init signal connected that takes time to execute especially when we request many objects, but at that point we can just check the instance, not sure how we can bypass this from 'outside' like an api view.

kevindice commented 3 years ago

Initially I thought that graphene-django was the cause of my performance issues, but it turns out that FieldTracker was performing tons of deepcopy operations. I saw little difference between graphene-django and DRF but a huge difference in disabling FieldTracker.

Details and timing here: https://github.com/graphql-python/graphene/issues/268#issuecomment-834761934

I'll try and see if there is something about how graphene works that causes the FieldTracker to spend so much time in deepcopy.

Curious about the context around your API - packages used to implement it? @furious-luke @mgaby25

I have used FieldTracker in the past with DRF and never experienced such a performance hit. Here, using it with both DRF and graphene/graphene-django it is substantial.

Didn't have any issues a long time ago with:

Django==3.2
django-model-utils==4.0.0
djangorestframework==3.11.0

Have substantial issues with now:

Django==2.2.13
django-model-utils==4.1.1
djangorestframework==3.12.4
graphene==2.1.8
graphene-django==2.15.0

phillipuniverse commented 2 years ago

@mgaby25 @kevindice I thought I would share my solution to this which allows you to conditionally disable the FieldTracker with a context manager (and with middleware).

_field_tracking_disabled = contextvars.ContextVar("field_tracking_disabled", default=False)

@contextlib.contextmanager
def field_tracking_disabled():
    """
    A context manager that can be used to disable any and all field tracking within a Django model. There is a penalty
    in initializing the "before" state of all fields for models when they are read and we do not have to pay that penalty
    if we are not actually updating anything on the model. This is applied by default to all GET requests within the
    ConditionalFieldTrackingMiddleware.
    Example usage:

        with field_tracking_disabled():
            some_model = SomeModel.objects.get(id=...)  # no 'before state' values are tracked
            do_something(some_model)

    Attempting updates to a model read within this context will cause an exception:

        with field_tracking_disabled():
            some_model = SomeModel.objects.get(id=...)
            some_model.name = "abc123"
            some_model.save()  # raises a ValueError

    The update prevention is applied to the lifecycle of the entire model instance and is initialized when the model is
    read. Thus, this will also cause an exception:

        with field_tracking_disabled():
            some_model = SomeModel.objects.get(id=...)
        some_model.name = "abc123"
        some_model.save()  # raises a ValueError

    Finally, this also prevents any access of the 'tracker' attribute on a model as it would otherwise cause inconsistent
    results as the initial state was never applied:

        with field_tracking_disabled():
            some_model = SomeModel.objects.get(id=...)
        some_model.name = "abc123"
        some_model.tracker.changed()  # raises a ValueError
        some_model.save()  # raises a ValueError

    This context manager is both threadsafe and async-safe.
    """
    token = _field_tracking_disabled.set(True)
    try:
        yield
    finally:
        _field_tracking_disabled.reset(token)

def is_field_tracking_disabled() -> bool:
    return _field_tracking_disabled.get()

class ConditionalFieldTracker(FieldTracker):
    """An implementation of FieldTracker that allows itself to be disabled"""

    field_tracking_disabled_attr = "_field_tracking_disabled"

    def __init__(self, fields=None):
        super().__init__(fields)

    def initialize_tracker(self, sender, instance, **kwargs):
         """Invoked as a post_init signal of the given instance"""

        # The post_init signal is registered in finalize_class, which is registered in contribute_to_class,
        # a Django model hook point (see http://lazypython.blogspot.com/2008/11/django-models-digging-little-deeper.html)
        # The super().initialize_tracker part of this is the problematic part and specifically the
        # set_saved_fields() method. We need to avoid that and the deepcopy() within it.
        # If field tracking is disabled, this will prevent all access to the 'tracker' (or self.name) property to prevent
        # strange results

        if is_field_tracking_disabled():
            if not isinstance(instance, self.model_class):
                # Necessary to ensure that our patching only happens _once_ because the post_init signal is registered for all models
                return  # Only init instances of given model (including children)

            setattr(instance, self.field_tracking_disabled_attr, is_field_tracking_disabled())
        else:
            super().initialize_tracker(sender, instance, **kwargs)

    def __get__(self, instance, owner):
        if instance is None:
            return self

        if getattr(instance, self.field_tracking_disabled_attr, False):
            raise ValueError(
                f"The {self.name} attribute of {instance.__class__} cannot be accessed when the model is initialized"
                " within a field_tracking_disabled context block. Either re-read the model outside of this block or"
                " modify settings.MODIFYING_GET_URL_REGEXES")
        else:
            return super().__get__(instance, owner)

    def patch_save(self, model):
        original_model_save = model.save
        super().patch_save(model)
        tracker_patched_save = model.save

        def conditional_tracker_save(instance, *args, **kwargs):
            is_update = bool(not instance._state.adding)
            field_tracking_disabled = getattr(instance, self.field_tracking_disabled_attr, False)
            if is_update and field_tracking_disabled:
                raise ValueError(
                    f"An instance of {instance.__class__} was initialized within a field_tracking_disabled"
                     " context block and updates will not work as expected! If this is intentional, you might need to"
                     " modify settings.MODIFYING_GET_URL_REGEXES")
            elif field_tracking_disabled:
                # In this case we did not initialize the _tracker attribute on the model since the initialize_tracker
                # was blocked. Therefore it is not safe to call the tracker version of the save, so instead invoke the
                # unmodified version
                return original_model_save(instance, *args, **kwargs)
            else:
                return tracker_patched_save(instance, *args, **kwargs)

        model.save = conditional_tracker_save

Then I added a middleware that disables it for all GET requests that can also look for a MODIFYING_GET_URL_REGEXES (for instance, if you're saving a model in an OAuth flow or you have something else in your application that modifies data on a GET):

class ConditionalFieldTrackingMiddleware:
    """
    Disables all field tracking on a GET request and hooks up extra validation to ensure there is never an update during
    a GET that might impact what happens in the AuditLog or any other place that we care about FieldTracker results
    """

    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        def continue_middleware():
            return self.get_response(request)

        is_whitelisted = False
        for regex in settings.MODIFYING_GET_URL_REGEXES:
            is_whitelisted = bool(re.findall(regex, request.get_full_path()))
            if is_whitelisted:
                break

        if request.method == 'GET' and not is_whitelisted:
            with field_tracking_disabled():
                return continue_middleware()
        else:
            return continue_middleware()

To use this, follow the same docs but use ConditionalFieldTracker instead of FieldTracker:

from django.db import models
from model_utils import FieldTracker

class Post(models.Model):
    title = models.CharField(max_length=100)
    body = models.TextField()

    tracker = ConditionalFieldTracker()

easherma-truth commented 2 years ago

Did the most recent changes in https://github.com/jazzband/django-model-utils/releases/tag/4.2.0 affect this?

phillipuniverse commented 2 years ago

@easherma-truth the fix I referenced above was targeting django-model-utils 3.2.0 and Django 2.2 so the performance issues have been present for a very long time.

dylan-shipwell commented 1 year ago

I noticed a bottleneck when using a large quantity of models that inherit from FieldTracker, specifically

registering an additional save signal per model, meaning Model() runtime increasing linearly with qty of TrackedField models
executing expensive isinstance() reflection on each of the ^ above signals

I have a monkeypatch that relies on monkeypatching https://github.com/jazzband/django-model-utils/blob/6d4112e8ce07c10ee1f61c17fe7358fb270e8b6c/model_utils/tracker.py#L343-L349

model_utils.tracker.models.signals.post_init.connect was patched to squelch all signal registration with django, on first squelch it registers one signal handler for my newly defined initialize_tracker_router func.

model_utils.tracker.FieldTracker.finalize_class was wrapped, after populating self.model_class this class is installed as the key of a registry dict, with a value of self

my newly defined initialize_tracker_router func does a keytest in the registry dict. (hash(class) in set) is significantly lighter than (callback(isinstance()) for callback in callbacks)

i would of course love to not monkeypatch this lib, please consider routing requests with a dictionary lookup on a cheap hashable if possible rather than relying on while true exec isinstance 🙇

phillipuniverse commented 1 year ago

@dylan-shipwell could you share your monkey patch?

z-amini commented 6 months ago

Adding sender=sender to following connect will prevent irrelevant model instances to call this signal handler. https://github.com/jazzband/django-model-utils/blob/46d33923ba03d54cecf651f7e297021a34f3bbaa/model_utils/tracker.py#L346

I mean

        models.signals.post_init.connect(self.initialize_tracker, sender=sender)

Either ways a solution to read-only model queries is still needed in many cases. :bowing_woman:

jazzband / django-model-utils

FieldTracker performance #323