paper-trail-gem / paper_trail

Track changes to your rails models
MIT License
6.79k stars 896 forks source link

Fix serializing of time/date columns using yaml serializer #1458

Open fatkodima opened 9 months ago

fatkodima commented 9 months ago

We are currently using paper_trail and have billions of items in the versions table and the table is huge.

I noticed, that for yaml serializer date/time objects are serialized as ruby objects. Something like

--- !ruby/object:ActiveSupport::TimeWithZone
utc: 2024-01-27 18:01:54.627764000 Z
zone: !ruby/object:ActiveSupport::TimeZone
  name: Etc/UTC
time: 2024-01-27 18:01:54.627764000 Z

This generates 179 bytes per field.

But that should be serialized into the string. Something like

--- 2024-01-27 18:03:07 UTC

That is 28 bytes long, so 150 bytes difference.

Considering that most people have at least 2 datetime columns (created_at and updated_at) for each table, that saves 300 bytes per row in the table. For example, if we have a table with 4 billion rows, that is 4 * 10^9 * 300 / 10^9 = 1200 GB 😱 saved.

fatkodima commented 9 months ago

Ok, this is not working for MySQL (type_cast still returns a Date/Time object instead of String). But, json serializer in this gem already works as expected and converts it using .as_json. Maybe we should use this too instead of type_cast?

viral810 commented 7 months ago

Ok, this is not working for MySQL (type_cast still returns a Date/Time object instead of String). But, json serializer in this gem already works as expected and converts it using .as_json. Maybe we should use this too instead of type_cast?

@fatkodima I think that makes sense to me. I tested out with MySQL and using .as_json returns a string and works as expected.

jaredbeck commented 5 months ago

Ok, this is not working for MySQL ..

It sounds like we have insufficient test coverage, then?

But, json serializer in this gem already works as expected ...

We've had a few issues with YAML serialization over the years, and I've often thought of changing the default from YAML to JSON. In fact, I think the best choice is a jsonb column, which obviates PT-serialization, and enables performant db queries.

You can also write a custom serializer. In fact, it would be great if you could try that first in your own app, and report back after a few months in production.

Please see https://github.com/paper-trail-gem/paper_trail?tab=readme-ov-file#6b-custom-serializer

fatkodima commented 5 months ago

You can also write a custom serializer. In fact, it would be great if you could try that first in your own app, and report back after a few months in production.

We are currently running this patch in production for postgres for 4 months already.

jaredbeck commented 5 months ago

We are currently running this patch in production for postgres for 4 months already.

Oh, great! Which RDBMS are you using in production? I'm trying to understand the above statement "this is not working for MySQL"

fatkodima commented 5 months ago

Updated with the suggestion, please take a look.

github-actions[bot] commented 2 months ago

This PR has been automatically marked as stale due to inactivity. The resources of our volunteers are limited. If this is something you are committed to continue working on, please address any concerns raised by review and/or ping us again. Thank you for all your contributions.

fatkodima commented 2 months ago

Ping.