taoensso / nippy

The fastest serialization library for Clojure
https://www.taoensso.com/nippy
Eclipse Public License 1.0
1.04k stars 60 forks source link

How to deal with migrating records/types between namespaces #152

Closed Outrovurt closed 1 year ago

Outrovurt commented 1 year ago

I have a namespace which defines a record A:

my.namespace.A

I have not created a custom freezer/thawer for A.

I then move the defrecord code to my.other-namespace.

If I now try to thaw it, I get a :nippy/unthawable exception, with e.g. :class-name "my.namespace.A", which is completely understandable and expected.

My question is, is it possible to somehow tell nippy that when I thaw the frozen data, it should now attempt to thaw it to my.other-namespace.A instead of the now no-longer-existent my.namespace.A?

This is a simplified example for a much more general problem, that of moving types across namespaces.

ptaoussanis commented 1 year ago

Hi there! Hmm, that's an interesting problem.

So if asked to thaw something that it can't thaw - Nippy will generally try to return the underlying unthawable data.

For example, if Nippy can't thaw a record type - it'll return:

{:nippy/unthawable
  {:type       :record
   :cause      :exception
   :class-name class-name ; e.g. "my.namespace.A"
   :content    content ; bytes
   :exception  e}}

If the fields of my.namespace.A are identical to my.other-namespace.A, you should be able to manually construct a my.other-namespace.A record with the byte content above.

For an example of how to do this, see the read-record source.

I.e. when you encounter a :nippy/unthawable value - you can check for {:type :record} and keep some kind of class migration mapping {<old-classname> <new-classname>}.

If the unthawable (old) classname maps to a new classname, you could take that as your signal to attempt a manual thaw using the new classname.

One potential problem: depending on the specifics of your case, it may/not be easy to actually identify :nippy/unthawable values.

For example if all the following hold:

Then it would be tough for you to implement something robust and fast yourself. In this case, it'd probably be ideal if Nippy exposed some kind of (e.g. dynamic binding) functionality to transform (reduce) each output value. So basically add support for a transducing xform on thaw.

That'd make even cases like the above easy, since you could tell Nippy to use your manual thaw logic as it encounters these values at any nesting level.

That could be generally useful, so I'd consider adding the functionality to Nippy if it'd help. Again, some details could depend on the specifics of your case though.

Let me know what you think. Cheers!

Outrovurt commented 1 year ago

Wow, thanks for the detailed response, unexpectedly long but very useful!

I think given that the library has been live for so many years and that it looks like no-one has ever run into this issue to date, my feeling is that mine is a very limited case. If your proposal is easy/quick to implement, then sure, go ahead.

Beyond the comment in nippy.clj explaining the 4-byte header and 1-byte type, I don't know how the data is encoded, but it seems that the record name is encoded within the byte array somewhere. If I understand correctly, the default behaviour if no custom freezer/thawer is found is to fallback to using pr-str / the Clojure reader respectively. So somewhere in the original encoded byte array is the string "my.namespace.A", and this is then used to attempt to find the record my.namespace.A on thaw, which is what is failing.

One question that arises: is there anywhere to "redirect" this lookup process so that a custom thawer can be used instead of the default? I believe this is what you are proposing in your enhancement, but I wondered if there is a quick way I can do it without you having to change the code. I tried setting up a custom thawer with an id of :my.namespace.A, but this didn't work. I expect that that's because the original frozen record is encoded with a type id for a record.

In answer to your above bullet "questions", I am currently in the development phase so this is not something I expect to happen too often. The item in question that is failing is embedded within a nested structure, but I've only lost two small test files as a result of this, which amounts to nothing whatsoever. And no, not sensitive to thaw performance at all, but considering where I am on the first two items, this isn't something that should become an issue in the future. It's just good to know whether is any way to deal with it or not.

As an alternative, would I be correct in thinking that if I wanted to make this more future-proof, i.e. allow migration of types between namespaces, I should consider creating custom thawers and update these accordingly as affected types are migrated? If so, this is no problem at all, and is something I (and probably others) should bear in mind. That there is a reader fallback is brilliant, but it is probably also worth being aware of the issue I have highlighted here and designing upfront to avoid it.

ptaoussanis commented 1 year ago

I don't know how the data is encoded

So we need to be careful about distinguishing between freeze and thaw stages.

Basically:

What's happening in your case:

I.e. the problem is that thaw is trying to create a class with the old classname, since that's what was written into the data.

As you can see from looking at he write and read implementations for records - the class name and field data are stored separately.

I.e. all you really need to do in your case is use the old class ("my.namespace.A") field data to create a record of the new class ("my.other-namespace.A"). So long as the fields are identical, the field data will be identical (i.e. compatible).

One question that arises: is there anywhere to "redirect" this lookup process so that a custom thawer can be used instead of the default?

Not currently, not without adding something like xform support.

The only way you can currently migrate from "my.namespace.A" to "my.other-namespace.A" is by scanning through the thaw result and searching for any :nippy/unthawable values like I mentioned in my previous comment.

I tried setting up a custom thawer with an id of :my.namespace.A, but this didn't work

Correct, this won't work. The reason this won't work is because Nippy isn't dispatching behaviour based on the classname, it only dispatches behaviour based on the high-level type (record in this case). The class name is essentially payload, not envelope data.

It may help to consider what the ideal customized solution in your case would be. That would be a 1-line addition in the let before this line:

class-name (get migrated-class-names class-name class-name) ; Should the class name be remapped?

With some sort of migrated-class-names state like `{"my.namespace.A" "my.other-namespace.A"}.

If you were going to fork Nippy for example, that's probably the most direct change you could make.

That's probably overly specific for a general Nippy feature though, which is where the xform might come in - since it's a bit of a middle ground: it would let you easily accomplish what you want, but it would also potentially have general utility for other uses/users.

not sensitive to thaw performance at all [...] It's just good to know whether is any way to deal with it or not.

There's definitely a few ways to deal with this, especially if you're not performance sensitive.

Options include:

These offer different tradeoffs. I'd probably lean to the custom freeze+thaw pair as a good default choice, but that does require you to first write the data differently in the first place. (I.e. this wouldn't be applicable to previously frozen data).

Outrovurt commented 1 year ago

Thanks a tonne, that is all tremendously helpful.

As a result I now realize that what I need to do is very straightforward, and doesn't actually require any changes to nippy, not even the fork you mentioned.

Writing custom freeze+thaw is definitely what is required in this situation, and it turns out I can even save the existing frozen data for my.namespace.A which I claimed was lost. Not much work is involved, but as a general workflow it's well worth keeping in mind as there are several options available when you want to serialize your data in nippy and it really is worth learning about all the pros and cons of each upfront to avoid getting into tricky situations.

I'll write something up in the next couple of days once I've tested that what I think works will work.

Thanks again, this was very educational.

ptaoussanis commented 1 year ago

👍 You're very welcome.

BTW for completeness, I should add: another alternative is to avoid the record type altogether. A record is just a map with a type name. If you coerce your record to a map before freezing ((into {} <record>)), it'll freeze as a standard map and won't be subject to any issues with changes to the record name, ns, or even fields.

And you can always transform a map back to a record if/when you need a concrete record type for some reason.

Outrovurt commented 1 year ago

Thanks a lot, great to know, and that really will simplify things considerably.