Store dumping options within entities

sorentwo commented 9 years ago

This introduces "persistent" options stored along with entities. That means when an entity was stored using JSON for marshaling and compression was enabled that entity can be loaded later, regardless of what the options currently are. Additionally, options can be specified per-call, so even when the instance is configured to use JSON for marshaling you can pass through marshal: Readthis::Passthrough for particular entities. There isn't any need to remember in your code which entities were written with which marshal, the same marshaller will be used for dumping. This grants a lot of flexibility and prevents unexpected errors when switching instance options.

The implementation relies on a fixed size header prepended to the entity body. There are benchmarks in the repository explaining the decision to use fixed size headers. Essentially they are quite a bit faster and more predictable.

dblandin commented 9 years ago

This looks very useful. Nice spec coverage as well! :+1:

fabn commented 9 years ago

Awesome :clap: :clap: :clap: thanks.

Just a curiosity, why you choose an ascii format with a such long prefix for flags? Wouldn't be better to use a single byte to store those info? It would save some space (23 bytes for each value) and it's probably faster.

Since you (like me) are benchmark addicted here's a quick proof of concept of using binary flags vs strings. I added Kernel.const_get in parse_a (the fastest) to make comparison fair because parse_c returns the class object instead of a string.

In my test parsing is faster and composing has the same performances.

sorentwo commented 9 years ago

@fabn: Awesome proof of concept. Initially I started with a concept that used a few bytes to store the flags, but got caught up on the idea that users can specify their own marshal library. Removing that constraint and limiting it to 8 possible marshals makes a lot of sense and gets to a smaller single byte.

Adapting your proof now and seeing how it works out.

fabn commented 9 years ago

Glad you liked it.

For the records I made a typo in composing bench. The difference is minimal, reporting this for completeness.

sorentwo commented 9 years ago

I pushed up some major changes with credit given to you. The primary downside is that this path partially breaks backward compatibility. Some marshaled objects, such as a hash, will be prefixed with 0x4 or other bytes between 1 and 15, so it isn't possible to accurately detect whether this is an "un-prefixed" old entity or a current entity.

fabn commented 9 years ago

I pushed up some major changes with credit given to you.

Wow, you have the fastest reaction times on the whole github :smile:

The primary downside is that this path partially breaks backward compatibility. Some marshaled objects, such as a hash, will be prefixed with 0x4 or other bytes between 1 and 15, so it isn't possible to accurately detect whether this is an "un-prefixed" old entity or a current entity.

Time for a major version bump? Otherwise a cache clear will be mandatory for running applications.

I see only a drawback with the latest commit, I'll open another issue to continue the discussion.

jhilden commented 8 years ago

Could you elaborate a bit on what one needs to be aware of when upgrading a production app from 0.x (e.g. 0.8.0) to 1.0?

sorentwo commented 8 years ago

@jhilden You need to be aware of how custom serializers work, but only if you were using something other than Marshal. New entities will be written with a prefix, so you'll probably want to do a cache clear when you upgrade.

jhilden commented 8 years ago

thanks for the information @sorentwo

Actually, we weren't using any custom serializers but still had exceptions during our first deployment attempt. Here is our complete readthis production config:

config.cache_store = :readthis_store, {
    redis: { url: $readthis_redis_url },
    expires_in: 1.day,
    namespace: 'cache'
  }

We got exceptions such as TypeError: no implicit conversion of Symbol into Integer when retrieving a Ruby object from the cache.

Our deployment issues were amplified, because we had different app servers deploying at different points in time. So even after clearing the cache, app servers with the old code would write new broken cache entries. For the second attempt we deactivated all but one application server and then everything went well with a single cache clear.

javierjulio commented 8 years ago

Adding to what @jhilden asked, @sorentwo how does this breaking change affect earlier users of the gem? I'm still rather confused on what to expect and what I need to do on updating. I'm in the same boat as @jhilden where we aren't using serializers, we started using readthis around v0.6.

sorentwo commented 8 years ago

@javierjulio If you didn't previously set a serializer it would use Marshal by default. The latest version will try to deduce the serialization and compression options, but may fail to do so correctly for some values. It really depends on what has been cached.

I'd recommend updating and clearing the cache, as @jhilden had to do. It's 1.0 for a reason, I'm sorry for the inconvenience!

javierjulio commented 8 years ago

@sorentwo thanks, I just got around to it and it went smoothly. No worries, its no inconvenience at all, I was just rather confused with what I was reading on what changed since I just plugged it in for Rails cache store. Thanks for the help! We're on readthis 1.2.0 now.

sorentwo / readthis

Store dumping options within entities #17