jazzband / django-auditlog

A Django app that keeps a log of changes made to an object.
MIT License
1.14k stars 412 forks source link

Migration Command Fails to Update All LogEntry Objects Due to Batch Processing Issue #667

Open fabianallendorf opened 1 month ago

fabianallendorf commented 1 month ago

Description:

While following the migration guide to update the django-auditlog package to version 3, I encountered an issue where the migration command did not fully migrate all LogEntry records.

Upon executing the migration command:

$ python manage.py auditlogmigratejson
Updated 101,000 records using Django operations.
There are 100,906 records that need migration.

I noticed that a significant number of LogEntry objects still had null values in the changes field, even though their changes_text fields were populated. This indicated that the migration did not affect these records.

Investigation:

While debugging, I examined the migrate_using_django method in auditlog.management.commands.auditlogmigratejson.Command:

def migrate_using_django(self, batch_size=None):
    logs = self.get_logs()

    if not batch_size:
        return _apply_django_migration(logs)

    total_updated = 0
    for _ in range(ceil(logs.count() / batch_size)):
        total_updated += _apply_django_migration(self.get_logs()[:batch_size])
    return total_updated

The issue appears to stem from this line:

total_updated += _apply_django_migration(self.get_logs()[:batch_size])

When there are more than batch_size DELETE and CREATE LogEntry objects that precede any UPDATE entries, self.get_logs() consistently returns the same initial set of records in each iteration. Since DELETE and CREATE entries do not require changes, the migration process becomes stuck on these records, leaving subsequent UPDATE entries unmigrated.

Possible Solution:

To address this issue, the batching logic should ensure that each batch processes a new subset of LogEntry objects. This can be achieved by implementing an offset mechanism or utilizing Django's Paginator for proper batch iteration. Here is a suggested fix using Paginator:

from django.core.paginator import Paginator

def migrate_using_django(self, batch_size=None):
    logs = self.get_logs()

    if not batch_size:
        return _apply_django_migration(logs)

    paginator = Paginator(logs, batch_size)
    total_updated = 0

    for page_number in paginator.page_range:
        page = paginator.page(page_number)
        total_updated += _apply_django_migration(page.object_list)
    return total_updated

This modification ensures that each batch processes a unique set of LogEntry records, allowing the migration to progress through all entries without repetition.

Environment:

Steps to Reproduce:

  1. Have a database with a large number of LogEntry records, including many DELETE and CREATE entries preceding UPDATE entries.
  2. Follow the migration guide to upgrade to django-auditlog version 3.
  3. Run python manage.py auditlogmigratejson with batching enabled.
  4. Observe that the migration does not update all LogEntry records.

Expected Behavior:

All LogEntry records should have their changes field populated after the migration, with no unmigrated entries remaining.

Actual Behavior:

A significant number of LogEntry records remain unmigrated, with null values in the changes field.

Workaround:

It is possible to avoid the issue by setting batch size to the amount of all existing LogEntry records.


I hope this report helps in identifying and resolving the issue. Please let me know if additional information is required.

hramezani commented 1 month ago

Thanks @fabianallendorf for reporting this issue.

@aqeelat do you have time to take a look?