Closed lowecg closed 5 months ago
@lowecg Hi there, thanks for pinging about this - and for the clear info! 👍
There were 2 changes in v3.3 likely to affect your case: here and here.
Nippy hasn't generally promised stability of its output format, but probably has been pretty stable in practice - and should continue to be.
The above changes were a bit of an exception since I was looking at a particular use case where even such micro-optimizations meant a difference of gigabytes.
Apologies for the trouble!
I've just updated the v3.3.0 changelog to include a warning:
Due to micro-optimizations of some elementary types, Nippy v3.3 may produce different serialized output to earlier versions of Nippy. Most users won't care about this, but you could be affected if you depend on specific serialized byte values (for example by comparing serialized output between different versions of Nippy).
--
It would be good to add a nippy test to detect such changes, and at least serialised output changes could be reported in the release notes to understand the upgrade implications better.
Absolutely agreed, this is a great idea 👍
PR very welcome if you feel like adding such a test, otherwise I'll take care of it myself when preparing the next v3.4 pre-release.
I believe a test should be straightforward - we can just capture + store a hash for each value in here: https://github.com/taoensso/nippy/blob/79f0212431d3e387ff39a48a6549318ae11167f2/src/taoensso/nippy.clj#L2025
We can then monitor the stored hashes to detect any unexpected changes. I'll try minimize such changes in future, and will mention them in release notes when they do occur (batched, and for good reasons).
The checksum between 3.3.0 -> 3.4.0-beta1 seems stable. Are there any plans that would affect that?
No other changes are specifically planned, but if you're not in a hurry - I'd recommend maybe holding off till the next pre-release (ETA ~Mar). Now that this topic's on my radar, I'll want to properly review it when preparing the next pre-release. That'd be the ideal time to start including the above-mentioned test.
Thank you, Peter. That's all very reasonable.
I'll create a PR for the test - my first thoughts were also to take a checksum based on the stress test. That'll serve nicely as a checksum canary.
👍 It's possible that you may need to dissoc a few more types from nippy/stress-data-comparable
. Most of the elementary types should be binary stable, but I'm not sure offhand about about types using Java's Serializable, etc.
Tracking of serialized output will be included in v3.4.0-beta2, thanks again for bringing this to my attention! 🙏
Thank you for your feedback and support on this, Peter. Very much appreciated!
My application relies on nippy serialised output to compute a checksum for change detection.
The output format has changed between 3.2.0 and 3.3.0, altering checksums, even for elementary data.
With nippy 3.3.0
I appreciate that you need to evolve the format from time to time. It would be good to add a nippy test to detect such changes, and at least serialised output changes could be reported in the release notes to understand the upgrade implications better. I suppose I'm asking if you could consider output format changes a breaking change.
I'm torn between reaping the benefits of improvements you're bringing to Nippy and the stability of the output format. Yes, please, to better speed and efficiency! However, in my case, checksum recalcs are expensive, so I'd have to cast my vote towards stability on the output format between minor release versions.
The checksum between 3.3.0 -> 3.4.0-beta1 seems stable. Are there any plans that would affect that? For example, could I move to 3.3.0 now and take the re-calc hit, knowing that when 3.4.0 is released, the output format will be stable?