ipld / go-ipld-prime

Golang interfaces for the IPLD Data Model, with core Codecs included, IPLD Schemas support, and some handy functional transforms tools.
MIT License
133 stars 50 forks source link

node/bindnode: support binding a plain Go map to an IPLD map #343

Open mvdan opened 2 years ago

mvdan commented 2 years ago

Right now, we implement ordered maps in the form of:

struct {
    Keys   []K
    Values map[K]V
}

This is because maps in IPLD must have a stable order:

Iteration order is defined to be stable: two separate MapIterator created to iterate the same Node will yield the same key-value pairs in the same order. The order itself may be defined by the Node implementation: some Nodes may retain insertion order, and some may return iterators which always yield data in sorted order, for example. 

What we implement now with the struct above is an ordered map that keeps insertion order. However, in many common use cases like graphsync, that's unnecessary; they just want a map to encode and decode as dag-cbor. In that case, a plain Go map that sorts its keys when MapIterator gets called should be enough.

Sorting the keys any time MapIterator gets called is a certain amount of overhead, both in copying the keys and in sorting them, but:

cc @warpfork @rvagg @hannahhoward

warpfork commented 2 years ago

Hm. One non issue: allocation amortization. It's a noteworthy issue where this pattern appears in the gogen code generator and in the basicnode implementation, but I don't think it applies here.

(In detail: if K is going to be returned as an ipld.Node, having it in one big slab of memory -- the Keys slice -- means the whole thing can escape to heap one time, and that can save on a lot of allocs that may otherwise occur. If those values needed to be boxed into an ipld.Node interface, one each, and didn't all come from one contiguous memory like that, it would mean iteration costs an alloc per step, which is not good.)

But this doesn't apply to bindnode: bindnode always has to return a new object for wrapping the key value, regardless, because the K type doesn't implement ipld.Node itself.

In fact, I guess making one []bindnode._node and wiring the contents up to contain all the keys may end up doing this amortization nicely, whereas (I think) currently there is no amortization, so this should... end up being a net win, often? (The wrapper allocs are O(n) vs sorting being O(nlogn), but I think the constant factor on allocs is quite a bit higher than the constant factors for comparison and swapping in the sorting.)