romgar / django-dirtyfields

Tracking dirty fields on a Django model
BSD 3-Clause "New" or "Revised" License
630 stars 111 forks source link

`post_save` signal does not get disconnected when class gets unreferenced #233

Open sjoerdjob opened 1 week ago

sjoerdjob commented 1 week ago

Describe the bug While running Django migrations, it can happen that subclasses of DirtyFieldsMixin get created (and later unreferenced, and thus destroyed). When a class gets unreferenced, the location that class holds in memory gets cleared up again, and a new Python object could be instantiated in that location. That new Python object might be another model class.

To Reproduce

See the attached project for a minimal example. Running python manage.py migrate (after having installed the latest version of Django and django-dirtyfields) triggers the issue most of the time.

minimal.tar.gz

Expected behavior Migrations always run completely pass.

Actual Behavior

Traceback (most recent call last):
  File "/home/sjoerdjobpostmus/dev/minimal/./manage.py", line 22, in <module>
    main()
    ~~~~^^
  File "/home/sjoerdjobpostmus/dev/minimal/./manage.py", line 18, in main
    execute_from_command_line(sys.argv)
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
    ~~~~~~~~~~~~~~~^^
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/core/management/base.py", line 413, in run_from_argv
    self.execute(*args, **cmd_options)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/core/management/base.py", line 459, in execute
    output = self.handle(*args, **options)
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/core/management/base.py", line 107, in wrapper
    res = handle_func(*args, **kwargs)
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/core/management/commands/migrate.py", line 357, in handle
    post_migrate_state = executor.migrate(
        targets,
    ...<3 lines>...
        fake_initial=fake_initial,
    )
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/db/migrations/executor.py", line 135, in migrate
    state = self._migrate_all_forwards(
        state, plan, full_plan, fake=fake, fake_initial=fake_initial
    )
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/db/migrations/executor.py", line 167, in _migrate_all_forwards
    state = self.apply_migration(
        state, migration, fake=fake, fake_initial=fake_initial
    )
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/db/migrations/executor.py", line 255, in apply_migration
    state = migration.apply(state, schema_editor)
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/db/migrations/migration.py", line 132, in apply
    operation.database_forwards(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        self.app_label, schema_editor, old_state, project_state
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/db/migrations/operations/special.py", line 196, in database_forwards
    self.code(from_state.apps, schema_editor)
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sjoerdjobpostmus/dev/minimal/stress/migrations/0002_auto_20241119_1734.py", line 28, in update_withoutdirty_instance
    inst.save()
    ~~~~~~~~~^^
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/db/models/base.py", line 891, in save
    self.save_base(
    ~~~~~~~~~~~~~~^
        using=using,
        ^^^^^^^^^^^^
    ...<2 lines>...
        update_fields=update_fields,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/db/models/base.py", line 1012, in save_base
    post_save.send(
    ~~~~~~~~~~~~~~^
        sender=origin,
        ^^^^^^^^^^^^^^
    ...<4 lines>...
        using=using,
        ^^^^^^^^^^^^
    )
    ^
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/django/dispatch/dispatcher.py", line 189, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/home/sjoerdjobpostmus/.virtualenvs/minimal/lib/python3.13/site-packages/dirtyfields/dirtyfields.py", line 172, in reset_state
    new_state = instance._as_dict(check_relationship=True)
                ^^^^^^^^^^^^^^^^^
AttributeError: 'WithoutDirty' object has no attribute '_as_dict'

Environment (please complete the following information):

A related Django issue is https://code.djangoproject.com/ticket/35801#comment:6 . I'm not sure if either of the codebases can fix this bug separately though, or whether it requires changes in both projects.

Suggested improvements:

LincolnPuzey commented 3 days ago

Hi @sjoerdjob Thanks for the bug report. This looks like an interesting issue 😃

I think that django ticket you linked to is the right one relevant to this. It looks like the maintainers there are considering two options

  1. Changing Django to hold a (weak)ref to the sender class
  2. Documenting that the signal should be disconnected before the class is de-referenced.

In the case of (1) I imagine that would fix this issue. For (2) we may have to adjust our code to disconnect the signal at the right time.


Your first suggested improvement is a good idea, but I don't think it will fix this issue.

In the code that is calling post_save.connect, also add 'something'

Do you have any suggestions on what that 'something' should be? I was going to say we would need to implement __del__ on the metaclass, but then I found weakref.finalize which might be the right thing to use.

sjoerdjob commented 3 days ago

Using weakref.finalize indeed does seem like the 'something' we might be looking for.

Do note the following comment though:

Note

It is important to ensure that func, args and kwargs do not own any references to obj, either directly or indirectly, since otherwise obj will never be garbage collected. In particular, func should not be a bound method of obj.

In this case, obj would be the class for which the signal should be disconnected. However, looking at Django's 'disconnect', it seems clear that Django requires the caller to pass the same value for sender as which was used to register the connect call.

So the most pressing question here is: During the callback phase of weakref.finalize, can we still get the class? First of all, we shouldn't keep an actual reference alive, as that would make the class never garbage collected. But we could keep a weakref alive, and hope that when the callback of weakref.finalize gets called, that weakref could be used to resurrect the model class temporarily, just long enough to call post_save.disconnect.

The weakref.finalize is a bit of synthetic sugar around weakref.ref(obj, callback), which has the following comment:

If callback is provided and not None, and the returned weakref object is still alive, the callback will be called when the object is about to be finalized; the weak reference object will be passed as the only parameter to the callback; the referent will no longer be available.

Which makes me think that at the time the callback of weakref.finalize is called, the model class is no longer obtainable, and as such it's too late to call disconnect(). Initial experimentation shows that the weakref pointing to the class would be 'dead' by the time it's called.

As such, without any changes in Django itself, I think the only way forward would be to create a metaclass that registers a __del__, which at least should be called when the class is still 'alive'. Just in time.

However, I myself personally don't have fun experiences with extending metaclasses for Django Model classes. There are quite some constraints there. In particular, Python will raise a TypeError if multiple bases are used which define metaclasses, and there is not a 'most derived' metaclass:

https://docs.python.org/3/reference/datamodel.html#determining-the-appropriate-metaclass

Which makes using metaclasses also a less-than-ideal solution.