Closed kevina closed 6 years ago
@kevina Yeah, this is a pretty good point. I'm not opposed to this.
Okay, this is something I will measure then. I will also measure the benefits (and cons) of using a string for a Multihash
.
It may be a while before I can get to it as I can't just change the Cid form a pointer to a value type without lots of code change so I will need to isolate the benchmark code to have a few dependencies as possible before I can measure anything.
I did some informal test and determined that the overhead for the benchmark in ipfs/go-ipfs#3376 was around 17ms. Most of the time for that benchmark was in the base32 decoding (see whyrusleeping/base32#1). This might still be useful but nothing urgent.
So, I opened a competing issue (#25) because having to call KeyString()
every time one wants to use a CID in a map is a pain (and slow). However, making multihash a string would also fix this issue.
@Stebalien as I am updating GC code that relies heavily on the indexing by Cid having the Cid stored as string possibly could improve it quite a bit.
Taking a look at my heap profile, it looks like the KeyString
function is now responsible for ~10% of my allocated memory.
@Stebalien making multihash a string is actually a different issue in a different repo, https://github.com/multiformats/go-multihash/issues/29. @whyrusleeping was concerned about the cost of converting the string to a []byte
but maybe now its the cost of converting a []byte
to a string that is really the problem.
Ah. I see. Either that or just storing CIDs in as byte strings (fully serialized) would work.
@Stebalien so if I understand you correctly, what you want is for the official CID representation to be a serialized string. That would be the most efficient in memory representative however they may be a non-trivial cost then if we need information on the Cid itself.
If there is any interest this is something I could look into.
My main concern is that I want it to be usable as a key in a map. We can achieve this by:
struct { version uint64, codec uint64, cidStr string }
). KeyString would return this (slightly more space).@Stebalien ah, now I see what you are after. In most of these cases we should stop passing around CIDs as pointers (see #38). I am not sure if the Cid type necessary should be directly storable as map keys, but we should definitely aim to reduce the cost of KeyString.
We are going with a string representation.
The conversion from plain
Key
toCid
introduced an extra layer of indirection that can likely be avoided.Currently Cid takes up the width of 5*64 bits. With it's current definition of:
May I suggest
As 32 bits should be more than enough for version and codec which I don't think will ever get very large (even 16 bits should be enough). If combined with multiformats/go-multihash#29 the size will now be 3*64 bits, which is the same size an array slice.
As the same size of a slice it becomes practical to pass it around by value which avoids an extra layer of indirection. Even though it is passed by value, the internals can still be kept hidden, as is done with many other go datatypes.
Just a suggestion. It might make a difference when you have an array of 1000's of Cids (which I might in some of my maintenance commands for the filestore, ipfs/go-ipfs#2634).