jazzband / django-auditlog

A Django app that keeps a log of changes made to an object.
MIT License
1.14k stars 411 forks source link

v1.0a1 serializes additional_data log entry field as string #266

Open l0b0 opened 4 years ago

l0b0 commented 4 years ago

After upgrading Django from 2.2 to 3.1.1 and django-auditlog from 0.4.7 to v1.0a1 I now have to json.loads(log_entry["additional_data"]) to get the same result as log_entry["additional_data"] before. Is this an intentional change?

Also, django 3.1 has a built-in JSONField which we'd probably want to use when available.

Real-Gecko commented 4 years ago

I created a PR https://github.com/jazzband/django-auditlog/pull/271 which replaces JSONField with TextField while retaining original functionality. It passes all tox tests, try it out.

l0b0 commented 4 years ago

That would be very, very bad for the project I'm on. We already have many millions of audit log entries. Why not leave it as a JSONField and deserialize it as before?

Real-Gecko commented 4 years ago

I don't know what changed in Django 3.1, but it fails tests with JSONField, replacing it with TextField solves the problem.

We already have many millions of audit log entries.

This won't destroy existing data, there're two migrations: first alters field type, second renames it to additional_data_json. In case you're not using PostgreSQL there even won't be type conversion on the column: https://github.com/adamchainz/django-jsonfield/blob/a4a5f339df0da850a1087ae7b3cd20a19ef8c6b8/jsonfield/fields.py#L71-L81 And you can still get log_entry["additional_data"], so I think no changes will be required in code. However if you're tied to the field name of additional_data somewhere outside Django this will cause trouble.

l0b0 commented 4 years ago

JSONField is built into Django 3, you should be able to use that directly now.

We already had to loads(log_entry["additional_data"] when upgrading to 1.0a1, are you saying that's meant to stay that way? Why change it? Is additional_data meant to be freeform text rather than JSON?

The comment about millions of records wasn't about losing them, but rather the migration taking possibly a very long time on a production system. I still don't quite understand why this shouldn't just stay a JSONField, simplifying the code, avoiding any JSON encoding/decoding incompatibilities, and saving time when upgrading.

Real-Gecko commented 4 years ago

We already had to loads(log_entry["additional_data"] when upgrading to 1.0a1

Nope, it's exactly what your issue about, log_entry.additional_data is Python dict.

This two methods ensure that it'll be valid:


    @property
    def additional_data(self):
        data = None
        try:
            data = json.loads(self.additional_data_json)
        except:
            pass
        return data

    @additional_data.setter
    def additional_data(self, var):
        self.additional_data_json = json.dumps(var)

shouldn't just stay a JSONField

changes field stores valid JSON to, but it's TextField, dunno why, so I decided to turn additional_data to TextField too.

rather the migration taking possibly a very long time on a production system

If field is only renamed it won't take much time :)

ndwhelan commented 4 years ago

I'm going to second not changing this. If anything, I'd rather see changes made in to a JSONField, though I would also be concerned about migrating that data.

Perhaps modifications could be made to support your use case without removing the power of a JSONField?

jezdez commented 3 years ago

I've left a few comments in #271 since I think it's a good compromise as well, in short: JSONField was always not really a JSONField but a glorified TextField with automatic JSON serialization/deserialization. Since django-jsonfield is not in maintenance mode and the author has recommended to move away from it, I think we get the same functionality by just treating the additional_data as the dumb text field it always has been and move the serialization/deserializaation into django-auditlog.

Some time in the future we can either migrate to django-jsonfield-backport (a backport of Django 3.1's proper JSONField) if we really see the need (e.g. smaller storage size, faster queries) which would imply a one-time data migration (which is always a risk). Or -- if enough time has passed and Django 3.1 is the community default for most projects -- just migrate to Django's JSONField in the same way.

In the meantime, I think #271 is a good step in between.

chrisittner commented 3 years ago

What is the status on this and https://github.com/jazzband/django-auditlog/pull/271? I looked through the issues and this appears to me to be the only blocker for releasing a stable 1.0 version on pypi? While this issue is important, so is keeping up to date with the mainstream Django version and the latest stable release of django-auditlog supports neither 3.0 nor 3.1. It would be great to resolve this quickly and to get django-auditlog compatible with mainstream Django 3.1. I'd be happy to contribute a patch if you point me in the right direction and there is a chance for it to get merged.

IMO it's unavoidable to move away from django-jsonfield (as that one won't get 3.1 compatibility) and, as stated above, there are two options:

Either route is fine, but since additional_data is not a central feature of auditlog, it seems sensible to first change it to a TextField with manual json handling, just to kill the django-jsonfield dependency and to get a released django-auditlog that is compatible with both mainstream (3.1) and LTS Django (2.2).

In a later release, the LogEntry model can be migrated to benefit from the new models.JSONField on all relevant fields (changes, additional_data,..), along with dropping Django 2.2./3.0 support for that new release.

eprikazc commented 2 years ago

@jezdez @chrisittner are you guys aware that for Postgres db django-jsonfield creates jsonb field ? https://github.com/adamchainz/django-jsonfield/blame/master/jsonfield/fields.py#L75 i.e. no matter if we change additional_data to TextField or to django JSONField, some databases will be affected and db migration will be needed anyway.

To me switching to JSONField looks like a natural choice going forward. Most databases and Django now support JSONField out of the box and it is handy for querying data inside of it. I understand the concern of having to do potentially lengthy migration, though. I am wondering if it would make sense to avoid it. For example, we could try to make a migration with 2 actions:

additional_data_old would be only in migration, not in LogEntry model. I assume Django allows to craft such migration. This way no data migration will be performed, yet all old data would still be available, and people could decide by themselves if they want to copy data from additional_data_old to additional_data column, or just drop additional_data_old. Or they can just stick to using older version of django-auditlog. Seems like a good compromise to move the project forward. If you guys ok with the idea, I am happy to prepare a PR.