Flexible secondary keys

theoreticalbts commented 4 years ago

We've already sort-of answered these questions:

(1) Should secondary keys be supported?
(2) Should secondary keys be implemented as objects in a separate table, or be part of a composite_key declaration in the same table?
(3) What is the highest supported number of keys per object type?

EOS answers "(1) Yes, (2) Separate native table, (3) Two."

So far we've decided on answers of "(1) Yes, (2) composite_key declaration, (3) At least two."

User level secondary keys

If secondary keys are implemented by a second table, this can be done entirely at the user level. That is, if there is no native secondary key support, if primary keys are long enough a user smart contract that wants objects indexed by, say, ID and date may use the following strategy:

Have a primary table where the primary key is the ID, and the date is kept in a data field.
Have a secondary table where the primary key is (date, ID).

The two tables are kept in sync, that is, any CRUD on the object updates the secondary object as well. Creating the secondary tables and updating the secondary objects in sync is largely boilerplate that could be simplified by a library and/or code generator.

Native secondary keys

An alternative approach to secondary keys is to offload the work of multiple indexing to BMIC / MIRA. I.e. the composite_key declaration looks like this:

composite_key< key_value_object,
   member< key_value_object, table_id, &key_value_object::t_id >,
   member< key_value_object, uint64_t, &key_value_object::primary_key >,
   member< key_value_object, uint64_t, &key_value_object::secondary_key >
>

In our discussion so far, we've discussed both approaches. We've decided to use native secondary keys, as this should have performance benefits, and reduce the amount of boilerplate that needs to go into smart contracts.

Flexibility

There are still a few minor problems with secondary keys:

(1) Any code which doesn't use secondary keys still uses overhead for storing / comparing them
(2) Any code which uses less than all the bits of a key still uses overhead for storing / comparing them
(3) Any code which needs tertiary keys (or quaternary keys, or n-ary keys) must still emulate them in user space

Fortunately there is a simple, performant solution to these problems:

Create BMIC / MIRA tables for various numbers / sizes of keys
Use some of the table ID bits to explain how long each key is (0 for unused keys)
Route table CRUD to the proper implementation table

So for example we might have these implementation tables:

(a) 64-bit primary key only
(b) 256-bit primary key only
(c) 64-bit primary key, 64-bit secondary key
(d) 256-bit primary key, 64-bit secondary key
(e) 64-bit primary key, 256-bit secondary key
(f) 256-bit primary key, 256-bit secondary key
(g) 64-bit primary key, 64-bit secondary key, 64-bit tertiary key
(h) 128-bit primary key, 128-bit secondary key, 128-bit tertiary key
(i) 256-bit primary key, 256-bit secondary key, 256-bit tertiary key

We'd specify 2 bits per key, 00 for unused, 01 for 64 bits, 10 for 128 bits, and 11 for 256 bits. So for example 011100 would be 64-bit primary key, 256-bit secondary key, no tertiary key, routed to table (e). A sequence like 100100 would be a 128-bit primary key and 64-bit secondary key, but there is no corresponding implementation table. So it would need to be routed to a table that can contain all key sizes, such as table (d).

We can generate a variety of implementation table sizes with boilerplate. Moreover, since the available implementation table sizes is a non-consensus implementation detail, we can always change it later to optimize in response to actual usage.

sgerbino commented 4 years ago

Should be moved to an internal document.

mvandeberg commented 3 years ago

Secondary keys, if they are implemented, should be implemented in the CDT.

koinos / koinos-chain