mafintosh / hyperdb

Distributed scalable database
MIT License
753 stars 75 forks source link

Questions: inflate field, InflatedEntry, trie encoding #79

Closed bnewbold closed 6 years ago

bnewbold commented 6 years ago

I've been updating the hyperdb and multi-writer specs to track changes from the past week in hyperdb v3. A few questions that I haven't been able to glean from the source:

  1. Why bother with a separate InflatedEntry protobuf message type, and how does the reader know which message type a given hypercore entry is? Eg, why not just always use InflatedEntry and leave the feeds and contentFeed fields blank if they aren't needed?

  2. Perhaps related to the above, what does the inflated field (type varint) indicate, and how is it used?

  3. How does the trie binary encoding work? I expected a repeated array of {feed, index} messages, but it's byte-packed. This one comment in trie-encoding.js is extremely terse and should be expanded upon:

    // encoding: i+bitfield+vals+...
    // val = (feed << 1)+more?,seq

My guess is that i is a varint of the length of the trie array, bitfield is bytes (or a varint of bits?) indicating whether the given array index has an element (val) or is empty (is this bits or bytes? left-packed or right-packed?). There are N val entries following, one for each high-bit in bitfield. The format is two varints: the first is the feed index, right-shifted one bit (with the low bit indicating... "more"? more what?) and the second varint being the sequence.

  1. contentField is not described. I assume it's used for hyperdrive-like applications. Is there more to it than just including in some early (first?) InflatedEntry message? Can subsequent InflatedEntry messages update or override? That seems complicated; a peer would need to sync the entire feed to find what the contentFeed key is to replicate it... maybe inflated is a pointer back to the most recent inflated entry index?

  2. Is the use of contentFeed to indicated that a special metadata entry should not be included as the first message in a hyperdrive+hyperdb feed, containing the public key of the "content" feed?

mafintosh commented 6 years ago

Ugh, I made a super detailed response to this that got lost somehow (prob my fault). Here we go again:

  1. The InflatedEntry is an internal implementation detail, to work around an issue where the optional values of the Entry message would be encoded in the payload. This is small optimization to work around that. I wouldn't put that in the spec.
  2. The inflated field is a seq pointer to the last message where the optional InflatedEntry fields are populated.
  3. Ya that's a bit terse hehe. See the below explaination
  4. contentFeed is an optional field containing the public key of a feed you want to attach to store big values in. The idea being you want to store lots of data in the db, but don't want to read that data out when walking the trie, and potentially want to chunk the data. The usecase comes from hyperdrive but plenty of applications outside which is why it's here. You should only set this once and keep copying it in the InflatedEntry message so it can be found quickly using the inflated pointer.
  5. The idea is to move the contentFeed out of the header message and just have it in the latest InflatedFeed instead. This makes the header more static and only contain things like version, protocol etc. The motivation for moving it into the InflatedEntry was that then you only need to read that one message.

Trie encoding

Yes, so we wanna encode pairs of (index, value, feed, seq) efficiently. Index is the trie index , i above, value is the bit value (0-4). All unsigned integers.

The way we do it is this. First encode the index as a varint. Then write a bitfield where the bits reflect with values are present for this index (right to left). For example bitfield 0b1 means the value 0 is present. 0b11 that 0 and 1 are present. 0b10 that only ` is present, etc.

After the bitfield comes the (feed, seq) pairs. So for each value there can be more than one pair (forks, hash collisions). Usually it's only 1 though. To pack this we add one bit (the more one) to donate whether the next feed,seq pair corresponds to the same value. The more bit is packed into the feed seq using feed << 1 | more because the feed value is usually a very small number and therefore has bits to spare.

mafintosh commented 6 years ago

@bnewbold reopen if more questions

bnewbold commented 6 years ago

@mafintosh sorry for the super long delay in coming back to this, but i'm still confused about trie encoding.

My assumption (and what I have written in the DEP) is that the trie field for any given Entry (protobuf field) in the field has only the trie entries for it's "branch of the trie", based on that entry's path array; it doesn't have a copy of the entire trie. Each bucket in the trie might have pointers for up to 4 branches/children total (because there are 4 possible values at each node of the trie), unless there are hash collisions (handled by the "more" bit).

Given that, i'm confused what the "trie index" is that you mention (index into the trie array? why would this be prefixed to the encoding? the length of the array? the index of the entry containing this trie array?). Also confused by "pairs of (index, value, feed, seq)"; did you mean "pairs of (feed, seq)"?

I'm going to write up my best guess in DEP form and send to you for review.

bnewbold commented 6 years ago

I had some coffee and thought about it more, and I think I understand now:

A "bucket" is an entry in the trie array, corresponding to a node in the trie. Each node can have up to 4 "children" (2-bit value), so a bucket can have up to 4 pointer groups (though in practice I think only 3 would be encoded, because we don't encode for our own "value"/branch. The i is the index of the bucket; there might be multiple bucket entries. The bitfield is short, at most 4 bits (eg, 0b10 could be written 0b0010), and is referencing the possible 2-bit values.

I was confused earlier because I thought the bitfield was referencing buckets (aka, could have length up to the length of the trie array). I also hadn't really understood that the trie is indeed a trie and has up to 4 children at a given node (not just a single (feed, entry) pair in a given bucket).

bnewbold commented 6 years ago

IRC follow up:

03:30 <mafintosh> It's actually 5 values in trie
03:31 <mafintosh> 4 (last one) is special                                       
03:31 <mafintosh> And means END_OF_ENTRY                                        
03:32 <mafintosh> Has to be the last entry in your path                         
[snip]
03:50 <bnewbold> right, I guess it's possible for the trie to branch on a slot
                 that has the special END_OF_ENTRY value
03:50 <bnewbold> should probably add an example with that
03:51 <bnewbold> (I like to demonstrate edge cases, not just mention them)
04:01 <mafintosh> It would branch on EOE on hash conlissions or some conflict
                  scenarios
04:04 <bnewbold> I guess hash collisions always/only are on EOE
04:04 <bnewbold> are there ever multiple ('more') pointers/branches where there
                 *isn't* a hash collision?
04:05 <bnewbold> I guess I should mention in there that even if the full
                 trie/path matches it's important to check in the last bucket
                 to see if there are collision pointers
04:05 <bnewbold> i'm glad somebody generated a hash collision that we can
                 include in tests