`getAll()` with both key and value (or index key and primary key)

inexorabletash commented 7 years ago

As a cursor replacement, getAll()/getAllKeys() has the issue that you need to pick - you can't get both the key and the value. For object stores this isn't so bad if inline keys are used, but still awkward. For indexes you can't even get the primary key.

Here's an example where getAll() could help but needs index key and primary key

https://stackoverflow.com/questions/44349168/speeding-up-indexeddb-search-with-multiple-workers

inexorabletash commented 5 years ago

A good name surfaced over in https://github.com/WICG/kv-storage/issues/63 -

getAllEntries()

... which sort-of aligns with ECMAScript's array iterators: values(), keys(), and entries()

It would produce an array of two-element arrays.

brettz9 commented 5 years ago

Which could nicely be fed into Object.fromEntries, passed into a Map, etc.

inexorabletash commented 5 years ago

Object stores are easy:

getAllEntries() → [[primaryKey, value], ...]

Indexes are less straightforward:

??? → [[primaryKey, value], ...]
??? → [[indexKey, primaryKey], ...]
??? → [[indexKey, primaryKey, value], ...]

Having getAllEntries() on an Index produce [[primaryKey, value], ...] is closest to how getAllXXX() work on Indexes today, but there are use cases for the other permutations as well.

inexorabletash commented 5 years ago

I put a PR up for a straightforward version of getAllEntries() but see above re: index use cases. No consensus on this yet, I was mostly playing with refactoring the methods.

inexorabletash commented 5 years ago

TPAC 2019 Web Apps breakout:

Consensus that we should solve this; calling getAll() and getAllKeys() and correlating is unfortunately common
Object stores: getAllEntries() looks good
Indexes: no good intuition

Follow-up: refine proposal for indexes

inexorabletash commented 5 years ago

Staring at this a bit more, I'd be tempted to have indexes produce:

[[primaryKey, value, indexKey], ... ]

i.e. if you only pay attention to the first two elements, this looks just like getAllEntries() and can be used with Map constructor. If you need the index key it's there.

inexorabletash commented 4 years ago

Sketching this out a bit more:

store.getAllEntries({
   query: key_or_range,
   count: limit,
   direction: next_or_prev
});
// yields: [ [ key1, value1 ], [ key2, value2 ], ... ]

index.getAllEntries({
   query: key_or_range,
   count: limit,
   direction: next_or_prev
}); 
// yields: [ [ primaryKey1, value1, indexKey1 ], [ primaryKey2, value2, indexKey2 ], ... ]

Note that this still doesn't allow trivial paging of results i.e. with a "skip".

All options are optional
The direction option would be IDBCursorDirection limited to "prev" or "next" (throwing otherwise).
Open question: when direction is "prev" are the results in forward or reverse order?

masterkidan commented 4 years ago

By default IndexedDb seems to store by ascending order, so I guess prev would be descending or reverse order. This should be fairly straightforward to do with leveldb, I have the changes locally (albeit against an earlier version of chromium) if this change has traction, then am willing to go ahead and open a PR for chromium to add it in, need to see what should be done for FF/Webkit browsers though.

On a related note, this is something that would be extremely useful with messaging apps, by default we store messages in ascending time order (next cursor) but we often need to show them in descending time order in the UI (latest message first or by using the prev cursor).

stefanhaustein commented 3 years ago

Indexes are less straightforward:

??? → [[primaryKey, value], ...]

??? → [[indexKey, primaryKey], ...]

??? → [[indexKey, primaryKey, value], ...]

For the index case, in my opinion, option 2 (indexKey, primaryKey) would be best, as this option would allow us to obtain the index keys for a range without the overhead of fetching full values.

I am not sure if this is covered by "key" value for the "query" option? Would be great if this could be clarified!

SteveBeckerMSFT commented 1 month ago

TPAC 2024: We discussed the getAllEntries() proposal referenced above, which we agreed is a good addition that should not be complex for browsers to implement. We discussed how this proposal addresses some of the limitations reported by developers in this issue, including:

Getting primary keys, values and indexes at the same time.
Adding a direction option to support reverse iteration.
Can reduce event loop overhead to potentially improve performance.

I've published an explainer that goes into more detail.

evanstade commented 1 month ago

the spec consistently uses the term "record" rather than "entry". @inexorabletash wdyt of getAllRecords()?

evanstade commented 1 month ago

Also, why: getAllEntries() → [[primaryKey, value], ...] instead of getAllEntries() → [{primaryKey: value}, ...]?

inexorabletash commented 1 month ago

For both of the above comments (naming, structure) I was heavily influenced by ECMAScript's Map.prototype.entries(), which was designed to play well with the (at the time) newly introduced destructuring syntax i.e.: for (const [k, v] of map.entries()) { console.log(k, v); }

... and if I recall correctly, this was before JS supported destructuring of objects, i.e. for (const {k, v} of ...) ...

Revisiting the proposed API to improve ergonomics is fine! Consistency with JS should be a factor, but not foolish consistency.

abhishek-shanthkumar commented 1 month ago

+1 about "record" being a better term for getAll...() when applied to regular object stores. But for indexes, IIUC, "record" => [indexKey, primaryKey]. If we're agreed on returning [primaryKey, value, indexKey] from IDBIndex.getAll...(), then the contract of the new API isn't exactly about returning the records as-is from the underlying (regular/index) object stores, but about returning a collection of attributes that we hope to be useful for both types of stores, and it just so happens to be the same as the record for regular object stores. From that perspective, it maybe better to use a new term, and "entry" could serve that purpose.

evanstade commented 1 month ago

Fair points by @abhishek-shanthkumar

On naming: IDBIndex's getAllKeys() already uses "key" to refer to the object store record's key (i.e. primary key), not the index's key. Likewise getAll() returns referenced values i.e. the value half of an object store record. So the established getAllXXX terminology is relative to the referenced object store. (Well technically getAll() should probably be called getAllValues() and then we could just call the new thing getAll(), but anyway...)

On form:

If we're agreed on returning [primaryKey, value, indexKey]

I think we are aligned on that function but perhaps not the form. It seems to me a bit unclean thing to return an array of a particular length where you just have to know which element is which thing, as opposed to an object such as

{`primaryKey`: primaryKey, `referencedValue`: value, `indexKey`: indexKey}

or

{`indexKey`: indexKey, 'record': {key: value}}

SteveBeckerMSFT commented 1 month ago

A few comments:

1) About the form { key: value }, I don't think this will work for all types of IndexedDB keys. For example, binary keys and array keys may cause issues.

2) For consistency, should we add a record property to IDBCursorWithValue? This property could return the same type as the equivalent value in the getAllRecords() result array?

Here are a few options we could consider for the form of the results:

1) Continuing with getAllEntries(), update IDBIndex::getAllEntries() to return an array of nested entries:

[[ indexKey1, [ primaryKey1, value1]], [ indexKey2, [ primaryKey2, value2]], ... ]

Advantages:

Consistency: getAllEntries() returns an array of entries for both IDBObjectStore and IDBIndex.
Compatible with existing ECMAScript entries. Like @inexorabletash pointed out above, developers may directly use the results to construct a Map or Object.

Disadvantages:

Duplicate keys exist in the entries array for indexes with non-unique keys.
Unusual ergonomics. Like @evanstade pointed out above; to access the value of the second record, developers would use results[1][1][1], which is not very intuitive.

2) Synthesizing the comments above with the existing IDBCursor interface, I propose the following form:

[  
{  'key': key1, 'primaryKey': primaryKey1, 'value': value1 },
{  'key': key2, 'primaryKey': primaryKey2, 'value': value2}, 
... 
]

Like an IDBCursor, for IDBObjectStore, key and primaryKey return the same value, which is the record's primary key. For IDBIndex, key returns the index key while primaryKey returns the record's primary key.

Continuing the example above, developers would use the following to access the value of the second record: results[1].value.

Like @inexorabletash pointed out above, developers have a lot of flexibility when using Array.map() with destructuring. For example, developers can construct a Map:

// Transform the results from `getAllRecords()` into an array of entries: 
// [ [ primaryKey1, { 'value': value1, 'indexKey': indexKey1 }], [ primaryKey2, { 'value': value2, ... ], ... ]
const map = new Map(get_all_records_results.map(
    ({primaryKey, value, key}) => [primaryKey, { value, 'indexKey': key}]));

// Use the map to access record values and index keys.
const value = map.get(primaryKey).value; 
const indexKey = map.get(primaryKey).indexKey;

I plan to update the explainer and prototype to use this format, but feedback is very welcome! I'm happy to continue to iterate.

evanstade commented 1 month ago

Conclusions above SGTM.

For consistency, should we add a record property to IDBCursorWithValue? This property could return the same type as the equivalent value in the getAllRecords() result array?

Isn't the IDBCursorWithValue already more or less the record (or a superset of the record)? We could maybe add an asRecord() convenience method, not sure how much demand there is though.

jyasskin commented 3 weeks ago

Is there something that distinguishes an IDB "record" from the "entries" in Object.entries and Map.entries? If there's not, I suggest using the name the rest of the platform has already used.

inexorabletash commented 3 weeks ago

Also, while "record" is a spec term, I don't think it has made it into the API yet so there isn't a consistency argument.

evanstade commented 3 weeks ago

while "record" is a spec term, I don't think it has made it into the API yet so there isn't a consistency argument.

My main argument here is that the spec and MDN (e.g.) both consistently use the term "record". Other spec terms tend to map directly into the API symbol names. To do otherwise would make it harder to reference the spec or MDN as documentation. But you are correct that if only looking at API symbol names, "record" hasn't been used yet.

Is there something that distinguishes an IDB "record" from the "entries" in Object.entries and Map.entries?

Here are a few:

Conceptually, records are immutable. You can only read them, delete them, or replace them.
Concretely
- IDB records cannot be accessed synchronously
- so far we're planning to use a different format for the data returned by getAllRecords as compared to Map.entries or Object.entries.

An IDB record is still conceptually a key/value pair, but "record" I think more strongly evokes exactly what it is, i.e. a "row in a database", than does "entry". TBH, either term work, so it feels like a bit of a bike shed and it's probably not possible to make a wrong decision here, but what is the concrete advantage of using "entry"?

w3c / IndexedDB

`getAll()` with both key and value (or index key and primary key) #206