h-REA / hREA

A ValueFlows / REA economic network coordination system implemented on Holochain and with supplied Javascript GraphQL libraries
https://docs.hrea.io
Other
143 stars 15 forks source link

Retrieve full revision history in all record responses #40

Open pospi opened 5 years ago

pospi commented 5 years ago
pospi commented 5 years ago

Should be aware of #9.

fosterlynn commented 5 years ago

@fosterlynn this will need some discussion in vf-graphql as to what to call & how to organise these fields.

OK. I'm good adding a creating user and date-time to all records. I think those will be different than the small number of created that are defined where we know we need them functionally. For one thing, created will refer to an actual agent; what you are talking about would be a user, more for debugging or tracking back what happened when there is a problem, or do I misunderstand?

If you are also thinking it will be a user, then yes we need to figure out how to make that more universal in terms of the graphql, let's discuss.

pospi commented 5 years ago

I'm thinking it's either a user or a group agent- basically some (REA) agent to which the record might be considered a record of interest, in that they somewhat have stewardship over it, given that they created it. They might have edit permissions by default, for example (at least for a short duration).

I concede that this is somewhat of an antiquated notion in distributed systems design. Does inScopeOf cover this / could it be used for the same purpose? Or should there be something synonymous with stewardingAgents as an additional field with the additional implication?

And all of this is getting into permissions & notifications territory...

fosterlynn commented 5 years ago

To try to sort out the agents involved:

I'm thinking it's either a user or a group agent- basically some (REA) agent to which the record might be considered a record of interest, in that they somewhat have stewardship over it, given that they created it

I'd like to keep "user" and "agent" separated conceptually. Group agents don't have user credentials, and some person agents tend to have many user credentials, even within one technical infrastructure or application. But I think you probably do mean agents in this case?

agent to which the record might be considered a record of interest, in that they somewhat have stewardship over it, given that they created it

I was thinking of this more as just informational, if you need to know who created something to track something down or whatever, this was standard database practice back in the day. Not sure how that relates to distributed stuff. But seems like generally a good idea, irrespective of the functional requirements of HoloREA and agents.

Or should there be something synonymous with stewardingAgents as an additional field with the additional implication?

Is there something not covered in the list at the top of this comment? If so, let's take a look at it.

pospi commented 5 years ago

Agents have relationships with each other in roles, which should give some basis for at least some of the authorization needs, as well as default visibility

Check. What's unknown there is which data to anchor such roles & access control to. Perhaps it's the network ID, and anyone with an edit_records capability on that network is free to do what they want. But I think it's more fine-grained, and that the context of who originally authored the record is probably an important anchor. edit_own_records (or _commitments, or whatever) sounds like a desirable capability to me.

Maybe we just don't know yet until we start getting user-level requirements coming in.

We certainly don't have to explicitly store the authoring agent address, since Holochain does that by itself. But it might be worth having it injected anyway; as this prevents other agents from being able to mess with things that have been entered by others (they can't pass someone else's agent key).

fosterlynn commented 5 years ago

I think it's more fine-grained

Agree.

the context of who originally authored the record is probably an important anchor

Agree also.

BTW, I'm not arguing we shouldn't have created date and created by agent or user (let's figure that out), I think that is a good idea, although I'd also like to understand what HC gives us for free. And then think about how to put that into graphql in a fairly universal way.

OT, seems also like we need to chat about what is an agent, even sticking just with a person for now. Although group agent plays in here also. And also perhaps what our requirements might be for the ocaps permissions.

pospi commented 5 years ago

My thinking at present is that maybe the metadata should be returned as its own object, a sub-record of the main VF records. That way, those details only need to be resolved and returned if the UI is specifically interested in returning them; and we avoid clouding the main records with a bunch of repeated fields.

It might not just be author & date stuff, I can see us adding previous revision IDs and updating user IDs down the track to enable querying of a full provenance of each record. That would be a good springboard to locate archival versions for retrieval, which is something we need to do eventually. In any event this is all data that Holochain records for us automatically, so implementation is confined to reads.

The other part of this which I don't think I quite elaborated on properly in that last post is getting uniqueness of new records to avoid accidental conflicts. For example, I create an event which is just {action: receive}, it subsequently gets deleted, then someone else creates {action: receive} and we have a conflict. I suspect the minimum that is required to avoid this is to have author ID and creation time actually be stored in the record entry, which would prevent users being able to create conflicts on other users' entered data (they can't fake user ID) and on their own data (they can fake creation time but would only be hurting themselves and under normal operation this creates uniqueness between multiple distinct records with the same content). Does that make sense?

fosterlynn commented 5 years ago

My thinking at present is that maybe the metadata should be returned as its own object

That seems like a good idea. I like the idea of being able to expand on the data too.

The other part of this which I don't think I quite elaborated on properly in that last post is getting uniqueness of new records to avoid accidental conflicts.

Not sure I understand this. Won't every record have a unique identifier (a hash)? Or is this about getting the same hash on records that have say identical action, time, provider, etc.? If so, then yes, having the creation time would certainly help that.

Also a side note: we do have more required fields than we have documented, because we have several "either or" things. Should I document those somewhere?

they can fake creation time

We could fix that by not putting it on mutations, and just saving the server time when we save any record.

pospi commented 5 years ago

You've got it RE hashes- records will share the same hash if their content is the same; so we should probably ensure it is always different.

Should I document those somewhere?

Yes please! Ideally log a new issue for "add validation for 'either or' fields" or something.

just saving the server time when we save any record.

Unfortunately we can't- there is no "server time"; only node time. And the user's machine can set that however it wants whether the data comes from the browser or from the DNA. But this is a minor issue if providing the wrong time only hurts the caller (which it will, if agent hash is also injected).

fosterlynn commented 5 years ago

Unfortunately we can't- there is no "server time"; only node time. And the user's machine can set that however it wants whether the data comes from the browser or from the DNA. But this is a minor issue if providing the wrong time only hurts the caller (which it will, if agent hash is also injected).

Then sounds like node time makes sense. OK with you?

And we use the user token?

I'm not sure if we want it in the graphql reference or not. Do you think it is something that should be standardized across implementation technologies?

pospi commented 5 years ago

Maybe we could just standardise parts of it: I think created_by, modified_by, date_created & date_modified are lowest common denominator.

We can have the core VF schema extended with custom fields by the Holochain implementation; and the structure of the codebase will make it clear what of that is part of the VF spec and what is additional metadata for this implementation specifically. Would be good to proof that setup, too- lots of others will want examples to follow.

pospi commented 4 years ago

Prior to #75, this might be a good flow for using "trusted time", i.e. the time of the write operation: https://forum.holochain.org/t/how-to-provide-time-of-entry-authoring-automatically/1410/7

pospi commented 4 years ago

Figured out the simple flow now, it looks like this:

It would necessitate an extra link read to provide creation_time when reading the record later, but I think one additional DHT operation is pretty minimal- you can skip reading the entry itself if the timestamp is written to the link tag.

pospi commented 2 years ago

Can probably ignore the above, the HDK has changed significantly since this was written and there will be new (probably native) ways to retrieve this metadata in HDK3.x.

Something related which now has to be done, now that the protocol spec has stabilised enough to proceed-

pospi commented 2 years ago

Note also that this should be implemented in a transparent way within hdk_records and will probably affect most of the record_helpers.rs function signatures, since we should be returning this data in associated tuples with other record details such as ID and Entry data; and probably want to read it from native Holochain Element fields so that it can't be tampered with.

Connoropolous commented 2 years ago

@pospi since you just mentioned this elsewhere that you were working on it, and consequently I was just reviewing this issue I just 'assigned' it to you, based on our chat about ways of working, if that's ok.

pospi commented 2 years ago

Ah, thankyou for being on top of it! :)

pospi commented 2 years ago

Work on this started in feature/40-record-revisions-metadata, but pushing to lower priority and will loop back once some of the higher-priority MMR work is completed.

Note that the complicated parts of the implementation have been done, most of the work remaining will be the gruntwork of adding appropriate API endpoints and record metadata to all the existing datatypes.

Connoropolous commented 2 years ago

@pospi is the work for creation times and author metadata considered one-and-the-same as the 'revision history' work? Is it possible to get creation times and author metadata without going full on revision history?

pospi commented 2 years ago

Yeah. We can split this if you like or I can just do those quickly, actually they might just be this conversion. So we could merge that branch (feature/40-record-revisions-metadata) and import RevisionMeta without having to integrate/test RecordMeta as well (wasm-opt will definitely prune any dead code, if this doesn't happen earlier in the build process...)

Connoropolous commented 2 years ago

I think I like that path, I recommend proceeding that way. There's a lot of utility in just having 'who created and when' at a basic level, before having, "and who updated and when"

Connoropolous commented 2 years ago

@pospi do we consider this work done?

pospi commented 2 years ago

I don't think we have anything logged separately for full revision metadata, so I'll update the issue title to reflect what's needed.

Connoropolous commented 2 years ago

Nice, love the work breakdown.

pospi commented 2 years ago

With the remaining two issues (#347 & #348), because they are quite heavy in terms of DHT ops it will be best to implement them as optional return data. read_revision_metadata_abbreviated should include #346 and #40 only.

To support these changes, and read_revision_metadata_full should be further split apart to support independently loading original & latest revisions where the data is already known, and all fields of RecordMeta except for retrieved_revision should be made optional. The struct can then be returned efficiently on a per-record basis, as follows:

We can also further optimise updates by returning the previous revision's metadata from hdk_records::records::update_record and prefilling that as previous_revision, but we may not need to since it's read internally in the update logic anyway and so probably would be read from the local cache in subsequent requests?

For read API requests (whether loading individual records or lists of them), we should add an extra parameter for determining whether to load full metadata for the record. This parameter should be set in the revision(revisionId: ID!) resolver for that type if the input record is queried with latestRevision, originalRevision, previousRevisionsCount or futureRevisionsCount set.