Retrieve full revision history in all record responses

pospi commented 5 years ago

[x] return metadata for the retrieved revision with all record responses
[x] #346
[ ] #347
[ ] #348

pospi commented 5 years ago

Should be aware of #9.

fosterlynn commented 5 years ago

@fosterlynn this will need some discussion in vf-graphql as to what to call & how to organise these fields.

OK. I'm good adding a creating user and date-time to all records. I think those will be different than the small number of created that are defined where we know we need them functionally. For one thing, created will refer to an actual agent; what you are talking about would be a user, more for debugging or tracking back what happened when there is a problem, or do I misunderstand?

If you are also thinking it will be a user, then yes we need to figure out how to make that more universal in terms of the graphql, let's discuss.

pospi commented 5 years ago

I'm thinking it's either a user or a group agent- basically some (REA) agent to which the record might be considered a record of interest, in that they somewhat have stewardship over it, given that they created it. They might have edit permissions by default, for example (at least for a short duration).

I concede that this is somewhat of an antiquated notion in distributed systems design. Does inScopeOf cover this / could it be used for the same purpose? Or should there be something synonymous with stewardingAgents as an additional field with the additional implication?

And all of this is getting into permissions & notifications territory...

fosterlynn commented 5 years ago

To try to sort out the agents involved:

Agents have relationships with each other in roles, which should give some basis for at least some of the authorization needs, as well as default visibility
If a flow, then there are provider and receiver agents
The inScopeOf generally will be a group agent (or at least has been so far), and that will cover some of the "record of interest" thing - agents related to the group in certain ways could have certain access to records in scope of the group

I'm thinking it's either a user or a group agent- basically some (REA) agent to which the record might be considered a record of interest, in that they somewhat have stewardship over it, given that they created it

I'd like to keep "user" and "agent" separated conceptually. Group agents don't have user credentials, and some person agents tend to have many user credentials, even within one technical infrastructure or application. But I think you probably do mean agents in this case?

agent to which the record might be considered a record of interest, in that they somewhat have stewardship over it, given that they created it

I was thinking of this more as just informational, if you need to know who created something to track something down or whatever, this was standard database practice back in the day. Not sure how that relates to distributed stuff. But seems like generally a good idea, irrespective of the functional requirements of HoloREA and agents.

Or should there be something synonymous with stewardingAgents as an additional field with the additional implication?

Is there something not covered in the list at the top of this comment? If so, let's take a look at it.

pospi commented 5 years ago

Agents have relationships with each other in roles, which should give some basis for at least some of the authorization needs, as well as default visibility

Check. What's unknown there is which data to anchor such roles & access control to. Perhaps it's the network ID, and anyone with an edit_records capability on that network is free to do what they want. But I think it's more fine-grained, and that the context of who originally authored the record is probably an important anchor. edit_own_records (or _commitments, or whatever) sounds like a desirable capability to me.

Maybe we just don't know yet until we start getting user-level requirements coming in.

We certainly don't have to explicitly store the authoring agent address, since Holochain does that by itself. But it might be worth having it injected anyway; as this prevents other agents from being able to mess with things that have been entered by others (they can't pass someone else's agent key).

fosterlynn commented 5 years ago

I think it's more fine-grained

Agree.

the context of who originally authored the record is probably an important anchor

Agree also.

BTW, I'm not arguing we shouldn't have created date and created by agent or user (let's figure that out), I think that is a good idea, although I'd also like to understand what HC gives us for free. And then think about how to put that into graphql in a fairly universal way.

OT, seems also like we need to chat about what is an agent, even sticking just with a person for now. Although group agent plays in here also. And also perhaps what our requirements might be for the ocaps permissions.

pospi commented 5 years ago

My thinking at present is that maybe the metadata should be returned as its own object, a sub-record of the main VF records. That way, those details only need to be resolved and returned if the UI is specifically interested in returning them; and we avoid clouding the main records with a bunch of repeated fields.

It might not just be author & date stuff, I can see us adding previous revision IDs and updating user IDs down the track to enable querying of a full provenance of each record. That would be a good springboard to locate archival versions for retrieval, which is something we need to do eventually. In any event this is all data that Holochain records for us automatically, so implementation is confined to reads.

The other part of this which I don't think I quite elaborated on properly in that last post is getting uniqueness of new records to avoid accidental conflicts. For example, I create an event which is just {action: receive}, it subsequently gets deleted, then someone else creates {action: receive} and we have a conflict. I suspect the minimum that is required to avoid this is to have author ID and creation time actually be stored in the record entry, which would prevent users being able to create conflicts on other users' entered data (they can't fake user ID) and on their own data (they can fake creation time but would only be hurting themselves and under normal operation this creates uniqueness between multiple distinct records with the same content). Does that make sense?

fosterlynn commented 5 years ago

My thinking at present is that maybe the metadata should be returned as its own object

That seems like a good idea. I like the idea of being able to expand on the data too.

The other part of this which I don't think I quite elaborated on properly in that last post is getting uniqueness of new records to avoid accidental conflicts.

Not sure I understand this. Won't every record have a unique identifier (a hash)? Or is this about getting the same hash on records that have say identical action, time, provider, etc.? If so, then yes, having the creation time would certainly help that.

Also a side note: we do have more required fields than we have documented, because we have several "either or" things. Should I document those somewhere?

they can fake creation time

We could fix that by not putting it on mutations, and just saving the server time when we save any record.

pospi commented 5 years ago

You've got it RE hashes- records will share the same hash if their content is the same; so we should probably ensure it is always different.

Should I document those somewhere?

Yes please! Ideally log a new issue for "add validation for 'either or' fields" or something.

just saving the server time when we save any record.

Unfortunately we can't- there is no "server time"; only node time. And the user's machine can set that however it wants whether the data comes from the browser or from the DNA. But this is a minor issue if providing the wrong time only hurts the caller (which it will, if agent hash is also injected).

fosterlynn commented 5 years ago

Unfortunately we can't- there is no "server time"; only node time. And the user's machine can set that however it wants whether the data comes from the browser or from the DNA. But this is a minor issue if providing the wrong time only hurts the caller (which it will, if agent hash is also injected).

Then sounds like node time makes sense. OK with you?

And we use the user token?

I'm not sure if we want it in the graphql reference or not. Do you think it is something that should be standardized across implementation technologies?

pospi commented 5 years ago

Maybe we could just standardise parts of it: I think created_by, modified_by, date_created & date_modified are lowest common denominator.

We can have the core VF schema extended with custom fields by the Holochain implementation; and the structure of the codebase will make it clear what of that is part of the VF spec and what is additional metadata for this implementation specifically. Would be good to proof that setup, too- lots of others will want examples to follow.

pospi commented 4 years ago

Prior to #75, this might be a good flow for using "trusted time", i.e. the time of the write operation: https://forum.holochain.org/t/how-to-provide-time-of-entry-authoring-automatically/1410/7

pospi commented 4 years ago

Figured out the simple flow now, it looks like this:

new request hits the zome API
entry is constructed and written
entry header is read to retrieve timestamp
timestamp is written to a separate entry and linked to the main entry. Timestamp is part of the link tag.
timestamp is injected into response data as creation_time or similar
response data is sent; request ends

It would necessitate an extra link read to provide creation_time when reading the record later, but I think one additional DHT operation is pretty minimal- you can skip reading the entry itself if the timestamp is written to the link tag.