ipfs / go-cid

Content ID v1 implemented in go
MIT License
157 stars 47 forks source link

WIP: Make Cid an interface and change default representation to a String #64

Closed kevina closed 6 years ago

kevina commented 6 years ago

This builds on #47 but instead makes the Cid type an interface. I was initially against the idea but then realized that it will help with the switch to base32 by allowing a base to be stored with a Cid, so that when it converted back to a string the same base can be used.

This creates two new types that can be used directly: (1) CidString a string that is the binary representation of a Cid. This type can be used directly in Maps and when the additional overhead of an interface creates performance problems. (2) CidWithBase a Cid with a multibase associated with it. Calling String() will reencode CidV1 using that base.

This will still break code as *cid.Cid will need to change to cid.Cid but it should be less painful (as nil can still be used) and only needs to be done once.

warpfork commented 6 years ago

Is it possible to get some rough benchmarking of how using an interface will perform compared to doing a direct typedef of string as in https://github.com/ipfs/go-cid/pull/47 ?

I like interface flexibility, but suspect interfaces here actually always compel some amount of performance overhead, because an interface will kick things back into effectively being pointers (because an interface itself is always two words wide: one for the concrete type info, and one for the pointer to the rest of the data -- even if the concrete type is otherwise not a pointer). As far as I understand, this would probably thus still cause lots of pointer deref overhead and cause lots and lots of casts -- stuff that will show up on profiler info and call graphs as runtime.convT2E -- even when the concrete implementation would be CidString. The performance of that cast itself isn't a huge issue... except because we're in pointer land, it tends to force an allocation, and that then can become pretty costly via the eventual GC pressure if its on any sort of hot path. It's possible this gets optimized out by the compiler in some cases, but I don't know much about this, and if there are any such optimizations, I'd assume they're pretty narrow.

warpfork commented 6 years ago

@Stebalien and I were also meditating a little bit on friday about typedef-string-vs-struct and had an interesting thought: so long as we make the essential *Cid -> Cid migration everywhere, syntactically... and of course limit all accesses of fields to via functions, use consts for == cid.Zero checks, etc... we could actually make future refactors between a concrete type that's a string vs a concrete type of a struct syntactically non-breaking to consumes. Sort of a poor-(gc-concerned)-man's interface, without actually being an interface, if you will. Not sure if that's useful feedback to this PR in particular, but just wanted to throw that thought out there.

kevina commented 6 years ago

@warpfork The idea behind this implementation is when it matters, you can just use the CidString type directly. In most uses of a Cid the I image the extra overhead will be negligible.

Stebalien commented 6 years ago

So, I'm less worried about the performance and more worried about mixing formatting/display information with actual data. Is there a meta issue for discussing the problem we're trying to solve here?

kevina commented 6 years ago

So, I'm less worried about the performance and more worried about mixing formatting/display information with actual data. Is there a meta issue for discussing the problem we're trying to solve here?

I will create it later today.

kevina commented 6 years ago

I will create it later today.

See ipfs/go-ipfs#5349

kevina commented 6 years ago

Closing, as we are going with a pure string representation. Please don't delete the branch.