sourcenetwork / defradb

DefraDB is a Peer-to-Peer Edge Database. It's the core data storage system for the Source Network Ecosystem, built with IPLD, LibP2P, CRDTs, and Semantic open web properties.
399 stars 40 forks source link

Allow reconstruction of the document from DAG in Lens fetcher #1959

Open AndrewSisley opened 11 months ago

AndrewSisley commented 11 months ago

The problem occurs when a Lens has an imperfect transform/inverse. For example, if a Lens sets verified to true in the transform, and then sets it to nil in the inverse:

sv=schema version 
Create doc with verified=false at sv=A
Set collection sv=B
Migrate doc to sv=B, verified=true
Set collection sv=A
Migrate doc to sv=A, verified=nil

At this point the doc will be rendered with verified=nil, even though it is being rendered at the schema version that it was created at, and the DAG says it was version=false.

This problem occurs whenever migrating back to a previously visited version, for example if going from A=>B=>C=>B, or C=>B=>C, etc.

If we wish to solve this, the best solution I have is to do the following:

store the last migration (either direction, or schema version) against the dockey in the datastore
- using special fieldID similar to DATASTORE_DOC_VERSION_FIELD_ID
- key/value should be optional, and only present if migrations have run
- Read into prop on EncodedDocument by fetcher(s), like schemaVersionID is

If the Lens defra code sees that it is returning a document to a previously existing state, instead of dumbly migrating like it does now, it must rebuild from the DAG using the versionFetcher and a linear migration set (A=>B=>C).

The problem is also visible without an inverse defined. For example, A=>B sets verified=true, doc migrated, then downgraded (no migration run), verified will however remain true regardless of it's committed/DAG value.

This would be quite expensive, and is really a work around for Lenses that are not really bi-directional. At the moment Lens is not DAG-aware, so such situations may be unavoidable quite frequently. In the future users may still prefer simple Lens transforms over read speed. As such I think it should be something to opt in to, at the collection level (we could maybe add a DB level config param later).

This is actually documented by TestSchemaMigrationQuery_WithSetDefaultToOriginal_AppliesInverseMigration as a means to an end - I don't think the implications of this were realised at the time of author. (verified should equal true in the last query, but it has been cleared by the inverse).

We can detect and fetch the originally commit value, but there may be a significant performance cost in doing so.

This will also affect docs synced via P2P (e.g. newer version synced (inversed), then migrated once schema updated locally).

TestSchemaMigrationQuery_WithSetDefaultToOriginal_AppliesInverseMigration may need to be replaced so that what it is currently testing remains tested once this issue is fixed.

Discussed in https://discord.com/channels/427944769851752448/1164241767235866775 (dev-db.Imperfect Lens Inverses)

AndrewSisley commented 10 months ago

What gets done here depends heavily on https://github.com/sourcenetwork/SIPs/pull/19