hildjj / node-cbor

Encode and decode CBOR documents, with both easy mode, streaming mode, and SAX-style evented mode.
MIT License
362 stars 74 forks source link

Encoding circular objects #35

Closed hildjj closed 7 years ago

hildjj commented 8 years ago
a = {}
a.a = a
cbor.encode(a) // RangeError: Maximum call stack size exceeded

Mark objects as visited with a symbol, and delete the symbol on a second pass? Throw an error if the symbol exists on encode. Alternately: replace value with NULL or other symbol. See what other libraries do.

sandhawke commented 7 years ago

I implemented this two-pass Symbol approach in my value-sharing fork of borc today. The Symbol stuff is at: https://github.com/sandhawke/borc-refs/blob/master/src/encoder.js#L417 but it's not clear when you'd want to have it turned on.

See the readme for more details on what I did. It doesn't just detect cycles, but actually serializes and deserializes them.

My first attempt was on a fork of node-cbor, but I got a few things wrong, and then decided to use borc instead. For one thing, I wasn't using Symbols. See node-cbor-refs if you're curious.

sandhawke commented 7 years ago

Did a bit more work. Benchmarking showed taking two passes was quite expensive (basically, encoding took 2x times as long, even when I skipped the low-level serialization on the first pass), so I'm wondering if it's okay to by default leave objects 'colored' with a symbol property. I expect it's harmless. Also, node coloring works for detecting all kinds of sharing, not just cycles. Sharing might be desirable, even if it's transmitted as copying. So, I do node coloring to detect sharing, then if there's sharing (and if caller wants) we'll maintain a path to this node to see if there's a cycle. (That is, people might want to allow sharing, but cycles are never okay. Node coloring can't tell the difference, as far as I can figure out.)

Basic code is here: https://github.com/sandhawke/borc-refs/blob/2e7a3f68442b982ae85882ff800d5f840921849a/src/encoder.js#L445-L473 with of course various surrounding bits. Again, I know this is in borc not cbor, but the basic issues should be about the same.

hildjj commented 7 years ago

This approach makes sense to me, particularly since the encode path is all synchronous already, so we don't have to worry about two happening at once. I wonder what would happen if we used a CBOR tag to encode what amounts to a weak reference to another part of the tree, so that you could reconstitute the loops on decode. I think that might require two passes in order to put an identifier on the target, however. I think I'll start with just detecting loops and aborting, which is better than running out of memory.

hildjj commented 7 years ago

Fixed in 89a3b931f618ccf42d40200eed9503cc18208c96.