wildfish / django-gdpr-assist

Tools to help manage user data in the age of GDPR
Other
174 stars 15 forks source link

Suggestion: use queryset iterators during bulk model anonymisation #42

Closed ghost closed 2 years ago

ghost commented 3 years ago

During testing, we've seen some high memory usage in the library when it is applied to models that have a large number of object records stored in the database.

This has been traced to the model.objects.all().anonymise() call in anonymise_db which uses a queryset but appears to cache a significant amount of query metadata before anonymisation of the first object takes place.

Since we have once-only usage semantics for the model objects in the anonymise_db use case, we could use a Django queryset iterator to reduce memory consumption.