hyperhyperspace / hyperhyperspace-core

A library to create p2p applications, using the browser as a full peer.
https://www.hyperhyperspace.org
MIT License
195 stars 12 forks source link

Assumptions/questions after a few days of reading #26

Open timsim00 opened 1 year ago

timsim00 commented 1 year ago

I'm really excited to be learning HHS, as I can see a lot of great, very intelligent, work has gone into it already. From reading the documentation it sounds like it will work for the app I'm building. Now I'm trying to understand how I would use HHS to implement the data model for my app. I've used many different databases but nothing like p2p. I'm very new to this space (no depth of knowledge of Merkle-DAG's, spanning trees, WebRTC etc...but I get graphs). I'm trying to bridge the gap between what I want to do and how HHC works. Here's what I've been able to piece together after a few days with a fresh set of eyes. My apologies in advance if I missed anything obvious in the docs. If anyone is able to confirm or correct my understanding and point me in the right direction I would really appreciate it! Open source examples are always welcome.


High Level

I can put my static SPA app on a CDN and peers will sync app data among themselves. A signaling server is needed to aid in this process, but load on the server is not high. HHS is like a toolbox with tools to enable this.

A Store is an abstract for actual data persistence to a peer, for instance IndexedDB in the browser. An app's data model is made up of many interlinked spaces. Sync logic takes data in a 'Space' and updates the store on a peer using the verify function of each class. The full graph of spaces is represented across the network of peers. Individual peers only have the spaces they edited or requested. Access is granted or revoked using the CausalSet. The whitepaper says CapabilitySet but I think the API has changed since it was written.

Spaces

Spaces can be nested within other spaces by referencing those spaces. Spaces can be discovered independently (of any particular app) via a 3-word code. A space is an abstract for exactly one grouping of data, "a chat room, an article, a blog, an ecommerce store, etc." A class (written by a dev) defines that grouping of data, a potential combination of literals and references to other spaces/class instances, based on business logic. A space is implemented when a class inherits from either an immutable HashedObject or a mutable abstract, ie MutableSet which is a MutableObject.

When to use one or the other, I'm not sure. Seems like a good use-case for HashedObject would be an error log. Or would an error log be represented as a HashedObject nested inside a MutableSet? What are some good use-cases for an immutable space? Will most app data be mutable?

"They can be universally looked up using 3-word codes, like suburb-suburb-awake." If a space really is a class instance, does that mean every single class instance needs a 3-word code? Is this what the Shuffle class is for? If I code a BlogSet class and an Article class, and there are 3 articles in the blog, does that mean there must be 4 spaces total for the blog? Is there ever a situation where the blog would be just one space?

Data Model

(my main question...how to model my app data?)

It seems like an application data model would represent a graph, with the application (for a particular user) pointing to some logical root. Various spaces/classes would then be organized under the root in a way that makes sense according to the business logic. Each user would have various access to all or parts of this node. ie.

App Data (for Acme Employee: Bob) =>

Acme Corp {   Meta: {orgSpace: "acme-corp-best"}   EmployeeSet -> Employee: {name, startDate, title, boss...}   CustomerSet -> Customer: {name, address...}   OrderSet -> Order: {date, total, Lines -> LineItem: {product, qty, price, amount, total}, customer...}   ProductSet -> Product: {name, priceSet...}   OutsideMeetingSet -> Meeting: {mtgDateTime, discussion, Invites -> Invite:{orgSpace: "abc-corp-great"}} }

Bob may be granted access to all or parts of this graph depending on his role at Acme. This is not the app I want to build, just an example.

Peer Discovery

"Peers in the network form application-defined groups over which mutable objects are synchronized. The method for obtaining peers is also application defined..." I'm not sure what this means. Let's say both Acme Corp ("acme-corp-best") and ABC Corp ("abc-corp-great") are pointing to their respective Org graphs. Does HHS have logic, or does the app need custom logic, to make sure Acme employees are primarily trying to sync with other Acme employees and not ABC employees, other than for shared data like an OutsideMeeting discussion Forum which both companies can access? How does an app define peer groups? Implicitly? Or is there example code for this?

Other questions:

Again, great work on HHC and thank you to anyone who can help confirm/deny assumptions and answer questions. It looks very promising!

sbazerque commented 1 year ago

Hello @timsim00 ! That's an amazing summary and list of questions, thank you again for putting it together.

Even though this library is already a few years old, it's only now becoming a viable platform for apps. The truth is that we're still figuring out a set of generic primitives to base p2p, Byzantine fault tolerant apps on. I think we've got all the pieces by now, but as you hinted I feel we need a set of established patterns / recipes to make it easier.

Data model

Yeah, HashedObject is for data that won't change, but it may contain references to mutable stuff. For example, in the chat group example the root object would look something like

class ChatGroup extends HashedObject {

   moderators: MutableSet<Identity>;
   messages: MutableSet<Message>;

   ...
}

So the ChatGroup object is immutable, but contains references to the moderators and messages sets that are mutable.

Like you said, internally a MutableObject is represented as a HashedObject (this is the mutable object's initial state, in the case of the set that would be an empty set). Changes are implemented by creating operations, which are instances of MutationOp (another derivative of HashedObject). So say when you add an element to a set and then save it to the store, MutableSet will create an AddOp containing the new element, and referencing the object you're modifying. Hence a graph of derivatives of HashedObject referencing each other is formed, and it grows as you add or modify data (crucially these objects never change, if you remove that element from the set later, a DeleteOp referencing the set and the AddOpwill be appended to the graph).

So the data model can be seen as transforming mutable state into an append-only graph that can be streamed to other peers.

Spaces

In general, a Space defines the sharing boundary an app will use. In the chat example, it makes sense to use the ChatGroup class as our sharing boundary: folks can join a chat group (and hence get access to all its contents). In document-based apps, a document is a natural candidate for a Space. Usually, the methods startSync and stopSync will configure the mesh node they're on by:

As you mention, it is possible to use spaces as an inter-app integration method. For example, if you define types for something like Slack or Discord, you could have a space for your entire workspace, with all the threads, but you could also define a space per-channel, to make this channels available in other apps / contexts besides your workspace (for example, you may want to make a channel public and embed it into your website, or bridge a channel between two different workspaces to help to organizations collaborate). But this is rather advanced and I've never used spaces like that yet!

Access control

This data model makes access control especially challenging, because the graph we're using to represent data may be replicated un-evenly to peers. So folks may use their access rights concurrently with they being granted or revoked, and some peers may see their actions as valid or invalid initially, but may have to change their mind as more data is replicated. That's what the CausalSets are for: they attach additional information to object mutations, in order to allow all the peers to agree on a state once enough data has been replicated.

Here's an example, again from chat groups, about the rule for deleting messages:

    protected createDeleteAuthorizer(msg: Message, author: Identity): Authorizer {

        // any member may delete their own messages, but only moderators can delete other's

        const membershipReq = author.equals(msg.getAuthor()) ?
                                                                this.getMembers()
                                                             :
                                                                this.getAdmins();

        const membershipAuth = membershipReq.createMembershipAuthorizer(author);

        return Authorization.chain(super.createDeleteAuthorizer(msg, author), membershipAuth);
    }

That's saying that we will require a membership attestation (that's an object that serves as proof that something belonged to a given set at the time of this change), and if the author of the message is the same as the author doing the change, we will require an attestation of the author belonging to the members set of the chat group - but if the author of the change (the deletion) is not the same as the author of the message, then he needs to attest that they belong to the moderators set.

This API is not intuitive enough yet, we'll try to find better primitives for expressing this. But in this terms, upon adding an Order to the OrderSet in your example, OrderSet should derive from CausalSet, and its createAddAuthorizer method should return something like this.employeeSet.createMembershipAuthorizer(employee), where employee is the author of the order. This kind of relationships could in theory be nested arbitrarily, to represent the authorization patterns in your app. About CapabilitySet: you're right, we're leaning on just using CausalSets as the right primitive (and probably more types of the Causal... flavor).

Peer Discovery

The default behavior when you use the sync(object) method of the MeshNode is to create a peer group of all the nodes that are trying to sync object, using dynamic peer discovery. So following your example, if the members of AcmeCorp and ABCCorp are sync'ing different objects, they will end up in different peer groups by default. However, in this case it'd probably be better to use a peer group that cannot be freely joined by just knowing which objects to sync. The full signature of sync is sync(object, mode, peerGroup), where a peerGroup is:

type PeerGroupInfo = {
    id: string,            // a string identifying this peer group
    localPeer: PeerInfo,   // our local identity
    peerSource: PeerSource // a method that will give us random peers to connect
};

peerSource may pull peers from an external source, like a database, or use the local replica of an HHS object (like the members set in the chat group, for example). There are several PeerSource implementations here.

As things stabilize a bit more, we'll write a reference manual trying to iron out how to go about all this things. I'll go over your list of questions as soon as I got a bit more time, please just add more questions if the above is not clear.

If perchance you'd be interested in working with us to get the library in shape for app creation, or want to suggest features you feel are missing, or how to go about documenting all this stuff, all would be very welcome.

Thank you again for your analysis & questions, I feel this will help us already!

timsim00 commented 1 year ago

Hello @sbazerque,

Talking through the playground logic:

Any help confirming this would be appreciated.

a. Creator/first peer (peer1) (setup id/store/resources)

a1. create the type of store I want (default is memory), but we want Idb. a2. create an Identity, from keypair a3. save id and keypair in the store (why?) a4. create a Resources, passing in peer1 id and store; use defaults for others (ie, LinkupServers, mesh)

b. Pulling that foundation into an app:

b1. create an instance of the app's data model, DocSpace, which is an immutable HashedObject (id, author, resources) b2. create a space from DocSpace and Resources b3. initialize the space (via entryPoint promise), which tries to connect to the store b4. point the DocSpace resources to the Resources created in a4. b5. tell DocSpace to start syncing; fn is local to DocSpace; validation, creates new MeshNode using resources. Calls broadcast, sync, then loadAndWatchForChanges.

c. Second Peer (peer2)

(DocSpace already exists somewhere, we have the ID for it) c1. repeat a1 - a4 (setup id/store/resources) c2. get the reference to the space from 3 word code and resources. c3. get the reference to the space's DocSpace c4. attach peer2 resources (peer2 id and store) to DocSpace c5. save DocSpace to peer2 store c6. tell DocSpace to start syncing

Majors: Resources, HashedObject, Space, Store, app's data model DocSpace

Identity: how to add register/login/logout functionality?

Data Model / ACL

Starting to understand, but still not exactly sure how to model the orgSpace example above, or do ACL. Looks like the default mode for a space is everyone can read/write a mutable within a space. But ACL is implemented via CausalSet and membership, as in your createDeleteAuthorizer example above. I need to play around with that.

Saving

sbazerque commented 1 year ago

Hello @timsim00, thanks again for your insightful questions!

Talking through the playground logic:

a. Creator/first peer (peer1) (setup id/store/resources) a1. create the type of store I want (default is memory), but we want Idb. a2. create an Identity, from keypair a3. save id and keypair in the store (why?) a4. create a Resources, passing in peer1 id and store; use defaults for others (ie, LinkupServers, mesh)

Right now, the store is serving as a sort of identity store as well. If you want to save objects signed by a given identity, both the Identity object and its associated keypair must be present in the store. This is something that needs refinement: I've created a small issue with more details and that you can use to track when this is implemented if you want.

b. Pulling that foundation into an app: b1. create an instance of the app's data model, DocSpace, which is an immutable HashedObject (id, author, resources) b2. create a space from DocSpace and Resources b3. initialize the space (via entryPoint promise), which tries to connect to the store b4. point the DocSpace resources to the Resources created in a4. b5. tell DocSpace to start syncing; fn is local to DocSpace; validation, creates new MeshNode using resources. Calls broadcast, sync, then loadAndWatchForChanges.

This looks good to me, it makes sense to use both sync (with mode=SyncMode.full) and loadAndWatchForChanges on an immutable object because some of the object fields may be mutable (yeah, it is counter-intuitive at first, if you have any suggestions about this API please send 'em along!). So in practice if you have say chat.messages and chat.moderators you can call node.sync(chat) and chat.loadAndWatchForChanges(), even though chat is itself immutable, and that'd be affecting the chat.messages and chat.moderator fields that do are mutable.

c. Second Peer (peer2) (DocSpace already exists somewhere, we have the ID for it) c1. repeat a1 - a4 (setup id/store/resources) c2. get the reference to the space from 3 word code and resources. c3. get the reference to the space's DocSpace c4. attach peer2 resources (peer2 id and store) to DocSpace c5. save DocSpace to peer2 store c6. tell DocSpace to start syncing

Also looks good to me. The actual API for broadcasting / lookup supports either full or partial hashes (the 3 words encode 36 bits of the hash). But other forms of encoding either the full or a part of the hash could be used (e.g. QR codes).

Majors: Resources, HashedObject, Space, Store, app's data model DocSpace Resources represent the user/peer combo...meaning the user id + client/browser the user is using for this session. Resources wrap the store, mesh, config/LinkupServers, aliasing/hash names and is responsible for syncing among peers. Space is the sharing boundary, among peers, for an app's data model. It wraps the app's data model and resources. HashedObject is an immutable abstract the app's data model, DocSpace, is derived from. Store is responsible for persisting data to a client, retrieving data. DocSpace is immutable but contains a mutable 'contents' attribute

I agree.

Identity: how to add register/login/logout functionality?

I'll try to frame this, I may be getting what you want wrong, but I'd say:

To login in a new device, you need to trust the device, since then it will be acting upon shared data on your behalf.

  • The key is to feed the correct Identity into Resources.create() to establish a session?

You need both the Identity object and the associated key pair object (of type RSAKeyPair at the moment). It's not like on a client-server app, where the client needs to prove to the server that it's being operated by you in order to log-in, here you actually need to move the private key onto the device so it can actually modify these shared data structures.

  • For register, an Identity needs to be created, as in this playground code.

Yes, if a new account is going to be used, you can just create a new Identity with whatever info is provided, and a freshly generated keypair!

  • For login, the username/pw/id can be encrypted, stored locally, as a list of users, retrieved as needed.
  • For logout, the username/pw/id needs to be removed from the app as the current Identity/User.

The way I view it, the private key should be securely moved to the device, and effectively erased to log out.

  • Not worried about logging in from another device atm.
  • Expected: when logging users in/out, whichever Identity is passed to Resources.create(), HHS will treat users as distinct peers (user/client combos) and data will be stored/synced accordingly. Need to test this theory.

OK I think maybe what you are aiming for is supporting several users on the same device, even if they don't trust each other? At this point I think you'd have to rely on the operating system's user system to provide real protection of each users data/keys from each other. I don't think we can use the browser for that ATM, since HHS doesn't use encryption at rest. You could always use a list of username / pwd to provide user switching, but the underlying data would still be accessible by all users if they poke in IndexedDB. More like a user-switching solution, without real protection from one another in the same browser.

Maybe what could be done to support something like user login on-device would be to have a new store, that wraps the IndexedDb but adds encryption-at-rest using a passphrase. It wouldn't be complete protections, since something may be inferred from the structure of the encrypted entries inside IndexedDb, but it would be pretty good still.

What do you think?

  • UNKOWN:
    • what's happening with store.save(key); store.save(id)?
    • Why are we persisting the keypair and id at the top level of IndexedDb?

It's really inside each store, and by doing it you enable the store to save objects signed by that identity, in practice enabling that identity to write to the store.

  • Does HHS use it?

Yes, the keypair is used automatically whenever an object authored by that identity is saved.

Good catch! The signing is done when you save the object in the store, and the signature verification is done right after receiving new objects during sync.

Data Model / ACL Starting to understand, but still not exactly sure how to model the orgSpace example above, or do ACL. Looks like the default mode for a space is everyone can read/write a mutable within a space. But ACL is implemented via CausalSet and membership, as in your createDeleteAuthorizer example above. I need to play around with that.

Yes, static (immutable) permissions can be modeled passing a writers: HasedSet<Identity> upon creating sets, arrays, etc. but when the permissions can change, as in an ACL, CausalSets are necessary. Still looking for a friendly API for this.

Saving DocSpace uses addDerivedField to create a 'contents' attribute derived from MutableReference.

The purpose of addDerivedField is to prevent an adversary from created unwanted aliasing within the app's data.

setValue is used to update contents.

Because it's within a space, new values are synced across peers.

How would a nested object attribute be updated?

It would be the same, but the sync method needs to be called in the SpaceEntryPoint for this objects as well. I'm working on a new sync mode where the entry point will monitor its mutable contents and start and stop sync'ing them accordingly, it will be ready soon!

await resources.store.save(ds);

I'm somewhat confused by the pattern. From looking at the code, looks like Resources has a reference to store, Space, Store HashedObject have a reference to Resources. I'm wondering if all the references are needed and if the relationships between Resources, Store, Space and HashedObject can be simplified or made more explicit.

You may be right. What's in there is just all the environment that the object needs to work. Open to suggestions!

The Person example gives a different take on this. The store and app data model, Person, are created, as in the playground, but the Space is not. The new Person is saved directly to the new Store. So, the Person won't be synced across peers. The example is just showing that store objects can be retrieved using a hash of their contents

Yes, sorry for the confusion. The example is meant to show how to save stuff to the store and retrieve it using its hash, but you're right, it wouldn't be syncrhonized.

timsim00 commented 1 year ago

Thanks for the feedback, much appreciated! I will continue to digest and experiment. The big thing for me right now is getting my understanding up to a level so I can model the data, do auth and basic CRUD.

You could always use a list of username / pwd to provide user switching, but the underlying data would still be accessible by all users if they poke in IndexedDB. More like a user-switching solution, without real protection from one another in the same browser.

Yes, good catch. I can't rely on users having their own account on a device....ie. two different users login from the Guest account on a friends' device. Just need to cover my bases. I can always encrypt data if it needs to be protected; most likely just an edge case atm. At the very least it sounds like I can do user switching and HHS can handle that.