p2panda / aquadoggo

Node for the p2panda network handling validation, storage, aggregation and replication
GNU Affero General Public License v3.0
68 stars 5 forks source link

Simplify/improve how we store and retrieve document views #267

Open sandreae opened 1 year ago

sandreae commented 1 year ago

Currently we insert a new document view every time a document is materialised to a new state. The row for a document view contains references to the actual values for each field, so this isn't terrible in terms of storage space and if we want to be able to seek back in a documents history it's very cool. But there are some overheads that come with this approach and maybe there is a different approach which would suit our current needs just as well.

The problems I see with the current setup:

Suggestions:

1) Don't store document views at all

pros

cons

2) Pin views we are interested in

pros

cons

adzialocha commented 1 year ago

I believe this is a missing feature we haven't implemented yet: only document views should be stored whenever there's a pinned relation to them or if they're the latest - aka the second scenario.

Pinning is an important feature and caching materialized views crucial so I think there's no way we can remove it. Surely there's ways to make it better but I don't see yet how.

Slightly related idea on complexity: from an angle of the p2panda specification and with a less column-based database thinking, maybe more graph-based database thinking it's not tooooo bad. It's actually quite intuitive as soon as one understands how all p2panda data relates to each other. Documentation, diagrams and architecture overviews could help as well, either in the code or an ARCHITECTURE.md.

sandreae commented 1 year ago

Documentation, diagrams and architecture overviews could help as well

Something like this? :heart_eyes:

erDiagram
    ENTRY |o--|| OPERATION : ""
    ENTRY ||--|{ LOG : ""
    ENTRY {
        string public_key
        string entry_bytes
        string entry_hash
        string log_id
        string payload_bytes
        string payload_hash
        string seq_num
    }
    LOG {
        string public_key
        string document
        string log_id
        string schema
    }
    OPERATION ||--|{ DOCUMENT : ""
    OPERATION {
        string public_key
        string document_id
        string operation_id
        string log_id
        string schema_id
        string previous
    }
    OPERATION_FIELD ||--|{ OPERATION : ""
    OPERATION_FIELD {
        string operation_id
        string name
        string field_type
        string value
        numeric list_index
    }
    DOCUMENT ||--|o DOCUMENT_VIEW : ""
    DOCUMENT {
        string document_id
        string document_view_id
        string schema_id
        string is_deleted
    }
    DOCUMENT_VIEW  {
        string document_view_id
        string schema_id
    }
    DOCUMENT_VIEW_FIELD ||--|{ DOCUMENT_VIEW : ""
    DOCUMENT_VIEW_FIELD ||--|{ OPERATION_FIELD : ""
    DOCUMENT_VIEW_FIELD {
        string document_view
        string operation_id
        string name
    }