netzkolchose / django-computedfields

Provides autogenerated autoupdated database fields for model methods.
MIT License
94 stars 14 forks source link

Many to many dependency triggers too many computations #130

Closed martinlehoux closed 1 year ago

martinlehoux commented 1 year ago

Hi there!

I'm not sure if this is a bug or an expected behaviour, but I needed to report it.

Here is the simple example

class HostingAdvertTypeTag(Model):
    objects: HostingAdvertTypeTagManager = HostingAdvertTypeTagManager()

    name = models.CharField(max_length=100, unique=True)

class HostingAdvert(Model):
    hosting_advert_type_tags = models.ManyToManyField(
        HostingAdvertTypeTag, blank=True, related_name="hosting_adverts"
    )

    @computed(
        ArrayField(models.CharField(null=False, blank=True, max_length=100), default=list, blank=True),
        depends=[
            ["hosting_advert_type_tags", ["name"]],
        ],
        prefetch_related=["hosting_advert_type_tags"],
    )
    def all_hosting_advert_type_tag_names(self) -> list[str]:
        ...

The issue is that when I use

hosting_advert.hosting_type_tags.set(HostingAdvertTypeTag.objects.filter(name__contains="POOL"))

there is a computation happening for all hosting adverts in my database, not only this one (as I would expect)

It seems that the dependency graph thinks this is the correct behaviour image

What do you think? Is there something I didn't get about many to many fields?

The issue occurs with version 0.2.3, but also with version 0.2.1

jerch commented 1 year ago

there is a computation happening for all hosting adverts in my database, not only this one (as I would expect)

Tried to repro with a small example resembling your schemes:

class HaTag(models.Model):
    name = models.CharField(max_length=100, unique=True)

class Ha(ComputedFieldsModel):
    tags = models.ManyToManyField(HaTag, blank=True, related_name="ha_s")

    @computed(
        ArrayField(models.CharField(null=False, blank=True, max_length=100), default=list, blank=True),
        depends=[("tags", ["name"])],
    )
    def all_tags(self):
        return [] if not self.pk else list(self.tags.all().values_list('name', flat=True))

and get these queries for h_instance.tags.set(HaTag.objects.filter(name__contains='a')) (captured with CaptureQueriesContext and printed with json.dumps):

which both look correct to me. Note that for a non-empty m2m field the workload is much higher, which is due to the additional pre_remove + post_remove signals, before django issues the actual post_add to get the right value in place. But I cannot do much about this, it is how m2m fields are abstracted under hood in django.

I def. dont see updates for neighboring rows on Ha. This ofc can happen, if you have additional cfs defined somewhere, that rely on the result of all_tags.

Can you share the query log with the superfluous updates?


Edit: Could it be, that your array value calculation is not stable (not re-entrant) for the same tags at the m2m field? Would lead to an update from [1,2,3] vs. [2,1,3] inequality (can be fixed by ordering the entries).

martinlehoux commented 1 year ago

Hello @jerch !

Thanks a lot for the fast feedback. I created an example repo to show you the issue https://github.com/martinlehoux/test_computedfields

after installing deps & migrating, you can run ./manage.py < test.py to see how the script behave

Here is what i get

Advert object (9) setup
self=<Advert: Advert object (9)>, all_tags='tag1, tag2'
Advert object (10) setup
self=<Advert: Advert object (9)>, all_tags='tag1, tag2'
self=<Advert: Advert object (10)>, all_tags='tag1, tag2'
Advert object (9) update
self=<Advert: Advert object (9)>, all_tags='tag1'
self=<Advert: Advert object (10)>, all_tags='tag1, tag2'

And this is the behaviour I don't understand : why would Advert object (9) computed field be recomputed when Advert object (10) gets new tags?

Thanks again!

jerch commented 1 year ago

Thx for the example, I see now what you meant. It is indeed the case, that the m2m signal handler loses information when passing things over to resolver._querysets_for_update, which leads to a much broader selection of rows for update than needed.

Will see if I can narrow the selection further down without breaking too much in _querysets_for_update.

martinlehoux commented 1 year ago

Nice to hear! If you don't have time to do this, let me know and I will see if I can help with a fix

jerch commented 1 year ago

Created #131, if you want to test it.

martinlehoux commented 1 year ago

Yep it fixes it! Looks good :)