sourcenetwork / defradb

DefraDB is a Peer-to-Peer Edge Database. It's the core data storage system for the Source Network Ecosystem, built with IPLD, LibP2P, CRDTs, and Semantic open web properties.
413 stars 41 forks source link

[EPIC] Defra Views #2071

Open AndrewSisley opened 10 months ago

AndrewSisley commented 10 months ago

See SIP https://github.com/sourcenetwork/SIPs/blob/sisley/views/x-views/README.md for end-goal

### Tasks
- [ ] https://github.com/lens-vm/lens/issues/5
- [ ] https://github.com/sourcenetwork/defradb/issues/2070
- [ ] https://github.com/sourcenetwork/defradb/issues/2073
- [ ] https://github.com/sourcenetwork/defradb/issues/2074
- [ ] https://github.com/sourcenetwork/defradb/issues/2147
- [ ] https://github.com/sourcenetwork/defradb/issues/2951
- [ ] https://github.com/sourcenetwork/defradb/issues/3024
AndrewSisley commented 3 weeks ago

Personal long-term notes stashed from branch (just posting here to save them, not for consumption by others):

/*
view cache item should pair with dag items
local storage field ids? (can be skipped for now and just store complex item under one key (json/cbor/etc)) - store as fixed-size arrays
of field values (with nil elements for nil fields) - don't store field names in the blob :)

item id can be the cid of the dag item (source doc cids + view cid) (could make it into docID looking like thing if we want,
but that complicates the logic a lot). - nope, original query order should be preserved somehow (update problem) (linked list perhaps?
ll doesn't actually need to affect fetch order, (unless limit/offset given))

dag can be spoofed in short term - we can calc the ids and stuff but not bother storing it initially (will hopefully mean
the ids dont change when we do it properly - this might be spoiled by the view id problem, could perhaps just use item index
for now?).
view cid is a little problematic as global colID does not yet exist, need some temporary thing here.
conclusion: just use index for now - this allows us to skip the linked list for now, and it is easy to build it in a way that doesn't
make linked-listing it in the future unnesecarily painful

collections become cached if the cache exists by calling views/refresh - need a second func to clear the cache, allowing the col to
become cacheless again (note, long term this can also work for normal cols)

/colID/itemID => blob
note: below is not desirable now, but may be handy when introducing the DAG/P2P for views, note this in a comment somewhere
// This can be made to fit the DataStoreKey format (new instance type, fieldID omitted for now (may be introduced later)):
/colID/instanceType/itemID => blob

// order-index (not required if explicit order given) (can be iterated in parallel to docs, allowing docs to be yielded once they are
// known to be next, this would also solve the limit/offset problem but would be wasteful depending on limit/offset sizes) - the linked
// list would actually remove the need for secondary indexes of order-by clauses
// problem: This set itself is not sorted, and would either need full iteration or point lookups (I benched point lookups, this is fine)
// note: colID prefix is handy for refreshes, not just for cosmetic reasons
/colID/current-itemID/next-itemID => nil (next item id optional)
Also needs the start item flagged:
/colID/s/itemID

//note: indexes can be done in later PR, ~~but are still needed for David (4weeks)~~ (solved by linked list)
// index auto created for order:
colID/indexID/fieldValue/itemID

Updates:
- refresh on-demand (is some benefit of keeping this long term, so maybe we want it, especially for testing of the fancy dynamic updates)
  this could be triggered via directive, or via client command
- time based expiry? (less useful for testing, but might be easier to implement (testing might be slightly harder (could just set really low expiry)))
- update as source updated (complex logic for figuring out which doc updates which item)

note: update methods are not actually mutually exclusive, and they can all be used together.  Dynamic updates should be defined on the view
(and perhaps default to off initially), and a timer, but the others do not need to be.
*/