jazzband / django-simple-history

Store model history and view/revert changes from admin site.
https://django-simple-history.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.11k stars 464 forks source link

"update_fields" option to "bulk_create" not supported? #1323

Open elserj opened 1 month ago

elserj commented 1 month ago

Problem Statement Using the "update_fields" option when using "bulk_create_with history" doesn't seem to be supported.

Describe the solution you'd like Ideally, I'd actually like if a new option was added that didn't add a history record if a new model is added, but if it is updated the history is also.

Describe alternatives you've considered

Additional context The "update_fields" option was added to "bulk_create" in Django 4.1 (https://docs.djangoproject.com/en/4.1/ref/models/querysets/#bulk-create), but I don't see a corresponding change in the "bulk_create_with_history".

I don't actually use the "bulk_create_with_history" as I only actually want changes to the history made after initial load. What I'd like to see is an "bulk_create_with_history" to have the additional ability to handle the "update_conflicts/update_fields" options, with an additional option to only create history records for updates, but not for new models.

JBrut22 commented 2 weeks ago

I saw your issue while looking for the same feature. I attempted to create a work around, but realized there might be a reason why they are not creating this feature...

I did successfully alter the code for bulk_create_with_history to allow you to update (you just add in the parameters from bulk_create to the args, then add them to the bulk_create function (lines 125-127) in simple_history.utils. However, this will not create a history record for updated records, only for created... Why? Because the bulk_create function from django returns all objects created/updated and does not specify which is which...

Based on the above, you would need to loop through all the records and check against the database to determine which are already existing using a pk or another unique field if the pk is an AutoField

Completion time for 500 records running on docker and a docker PostgreSQL db:

As you can see, it can be quite a bit slower using the custom function due to the looping. I imagine it would take a lot longer for the thousands to tens of thousands of records people are often working with.

Anyway, here is the function. I have only tested it with pk_is_autofield = True. Also, if you have an AutoField pk and there is no single unique_field, it will not work. I could have added code to use a list of unique fields, but this is what served my purposes for now.

def bulk_create_update_with_history(
    obj_list: list,
    model,
    pk: str,
    pk_is_autofield: bool = False,
    unique_field: str = "",  # if above is True, this must have a value
    update_fields: list = [],
) -> tuple[list | Any, int]:
    if pk_is_autofield and unique_field == "":
        raise ValueError("unique_field must be provided if pk_is_autofield is True")

    if not pk_is_autofield:
        unique_field = pk

    # create a list of unique identifiers
    obj_unique_list = [getattr(obj, unique_field) for obj in obj_list]
    # create fields list for values query
    fields = [unique_field]
    if pk != unique_field:
        fields.append(pk)

    # get the obj that already exist and convert to dict
    existing_objs = model.objects.filter(
        **{f"{unique_field}__in": obj_unique_list}
    ).values(*fields)
    existing_obj_uniques = existing_objs.values_list(unique_field, flat=True)

    # separate objs that need update from those to be created
    create_objs = []
    update_objs = []
    for obj in obj_list:
        if not hasattr(obj, unique_field):
            raise ValueError(f"Object does not have unique field: {unique_field}")
        # if the obj exists, add to update list
        if getattr(obj, unique_field) in existing_obj_uniques:
            # add the pk field if it is an autofield for existing objects
            if pk_is_autofield:
                existing_pk = existing_objs.filter(
                    **{unique_field: getattr(obj, unique_field)}
                ).first()[pk]
                setattr(
                    obj,
                    pk,
                    existing_pk,
                )

            update_objs.append(obj)
        else:
            create_objs.append(obj)

    # bulk create the objects that do not exist
    created_objs = []
    if create_objs:
        created_objs = bulk_create_with_history(
            create_objs,
            model,
        )

    # bulk update
    num_updated_objs = 0
    if update_objs:
        num_updated_objs = bulk_update_with_history(
            update_objs,
            model,
            fields=update_fields,
        )

    return created_objs, num_updated_objs