Handle large indexes better

In my own implementation of indexing (which also supports multi-column indexing), for unique indexes I use this format:

key: table\x00s<INDEXINDEX>\x00<COL1>\x00<COL2> 
value: primary_id_value  ,   which will point to table\x00p\x00<ID>, where the value will be the actual row / struct

PS. COLs are serialized in fixed width format so everything is nicely sorted lexicographically. Mostly as raw []bytes, or hex strings. Strings require some extra care due to their variable length size.

I use this form, because it is trivial to detect unique constraint collisions without doing a range scan, and automatically will be detected by badger on txn.Commit that two concurrent transactions tried to insert same secondary unique key. Encoding the primary key id value in the key itself (similar to what is above), would be actually more efficient on lookups when using badger (there should be no difference when using boltdb), but would require a way more elaborate uniquness detection (possibly 2 phase commit even).

For non-unique indices I use this format:

key: table\x00s<INDEXINDEX>\x00<COL1>\x00<COL2>\x00<PrimaryID> 
value: ""

(For non-unique indices I even disable the value fetching in badger iterators, as value is irrelevant)

When I do a point scan, I do a iteration from table\x00s<INDEX>\x00<COL1>\x00<COL2>\x000000000000000000 to table\x00s<INDEXINDEX>\x00<COL1>\x00<COL2>\x00ffffffffffffffff , and decode the keys on the fly.

This also allows me to utilize prefix-scans to do range scans using partial column information over the index (i.e. leading equalities, and optional one trailing range).

And there is no limit of how many items are there for a given secondary index key, it will scale, and is efficient to add or remove items from it. The long prefix is also not an issue, due to key prefix compression in badger (and most other LSM KV stores).

INDEXINDEX is an internal number of the index used. But using a hash of the index specification could be a bit more safe in regards to upgrades. (Or store it in some metadata in the DB itself as a mapping, and cache in memory on startup).

timshannon / bolthold

Handle large indexes better #106