Closed fwbrasil closed 7 years ago
Seems like a worthwhile fix to me.
👍
Ideally, we'd probably sort these by something like thrift field ID, but I guess it doesn't really matter, I don't think OrderedSerialization makes any promises about schema evolution right?
I don't think OrderedSerialization makes any promises about schema evolution right?
@isnotinvain I'm not sure, looking at the code I don't see why this could be a problem to the user who reported the issue since it's stable for a single compilation. Maybe there's a path somewhere that reuses serialized data from previous executions/versions? I haven't heard back from the user on this, and the problem was affecting only an integration test, so maybe we can drop the backward compatibility if we consider that OrderedSerialization shouldn't support this scenario? cc/ @johnynek
I think the use case here is for ephemeral serialization (between tasks deployed from the same jar). I don't think we have contemplated this as a replacement for any long term storage.
👍
Problem
Ordered serialization for thrift unions is not stable across scalac executions. This issue happens because
knownDirectSubclasses
returns an unorderedSet
and theType
concrete classes don't implement stablehashCode
s.Solution
Sort the result of the
knownDirectSubclasses
method by the type name.Notes
knownDirectSubclasses
is actually stable across compilations for hierarchies with up to four elements becauseSet
preserves the insertion order if the set has <= 4 elements. It's a side effect of the performance optimization that usesSet1
,Set2
, etc. for small sets. I've kept this behavior, but I'm not sure if it's necessary and would like to hear your thoughts on it.This issue doesn't seem to introduce runtime issues since I think users typically compile once and then execute the job, but it seems worth fixing it to avoid confusion.
It's hard to test this fix since it depends on multiple scalac executions. We could try to invoke the compiler programmatically but I'm not sure it's worth it.