Closed hildjj closed 7 years ago
I implemented this two-pass Symbol approach in my value-sharing fork of borc today. The Symbol stuff is at: https://github.com/sandhawke/borc-refs/blob/master/src/encoder.js#L417 but it's not clear when you'd want to have it turned on.
See the readme for more details on what I did. It doesn't just detect cycles, but actually serializes and deserializes them.
My first attempt was on a fork of node-cbor, but I got a few things wrong, and then decided to use borc instead. For one thing, I wasn't using Symbols. See node-cbor-refs if you're curious.
Did a bit more work. Benchmarking showed taking two passes was quite expensive (basically, encoding took 2x times as long, even when I skipped the low-level serialization on the first pass), so I'm wondering if it's okay to by default leave objects 'colored' with a symbol property. I expect it's harmless. Also, node coloring works for detecting all kinds of sharing, not just cycles. Sharing might be desirable, even if it's transmitted as copying. So, I do node coloring to detect sharing, then if there's sharing (and if caller wants) we'll maintain a path to this node to see if there's a cycle. (That is, people might want to allow sharing, but cycles are never okay. Node coloring can't tell the difference, as far as I can figure out.)
Basic code is here: https://github.com/sandhawke/borc-refs/blob/2e7a3f68442b982ae85882ff800d5f840921849a/src/encoder.js#L445-L473 with of course various surrounding bits. Again, I know this is in borc not cbor, but the basic issues should be about the same.
This approach makes sense to me, particularly since the encode path is all synchronous already, so we don't have to worry about two happening at once. I wonder what would happen if we used a CBOR tag to encode what amounts to a weak reference to another part of the tree, so that you could reconstitute the loops on decode. I think that might require two passes in order to put an identifier on the target, however. I think I'll start with just detecting loops and aborting, which is better than running out of memory.
Fixed in 89a3b931f618ccf42d40200eed9503cc18208c96.
Mark objects as visited with a symbol, and delete the symbol on a second pass? Throw an error if the symbol exists on encode. Alternately: replace value with NULL or other symbol. See what other libraries do.