Near-Term Blockstack Architecture Planning

kantai commented 6 years ago

Based on conversations online and offline with various team members at Blockstack and developers looking to build on Blockstack (and users of the Blockstack software), I spent some time trying to sketch out a near-term architecture of the various Blockstack components and their interactions, with an eye towards trying to (1) improve the stability of the software and APIs Blockstack provides and (2) reduce the complexity of dependencies between components. In my mind, each of these components would be a different "project" of Blockstack -- each component has a different version number, github repo, test cases, and build process (to the extent that unifying technologies can be used in multiple components, we should do that, but we should still be able to conceptualize the components as independent actors).

I added a markdown file to this branch:

https://github.com/blockstack/blockstack/blob/future-architecture/future-architecture.md

Anyways, I'm going to tag the various stakeholders on the team for feedback and discussion @jcnelson @jackzampolin @larrysalibra @muneeb-ali @shea256 @yknl -- I think we should discuss here and then try to merge changes and comments into the markdown file.

jackzampolin commented 6 years ago

Very excited about this!! A couple of notes and thoughts:

1. Blockstack Client

This seems pretty self explanatory, basically just the blockstack.js repo. That could be used for the browser, cli, and any custom implementations devs want. The only question I have is which other piece does this connect to? Just to a blockstackd node right?

Very excited that we will focus on this as the key part of the stack and dogfooding the heck out of it. This is going to be a huge win for devs who want to do their own on-boarding and other advanced ops.

2. `blockstackd`

Also pretty simple. I'm assuming we would be keeping the JSON over RPC interface and just documenting it and creating clients for it.

3&4. `blockstack-api` and `blockstack-resolver`

A couple of thoughts here:

These should live in the same repo methinks.
Does this resolver need to run against the entire network every time it runs, or would it resolve the entire network once on startup and then check for new nameops every ~10 minutes? Would we want to set TTL on profiles so they are getting checked every ~15-30 min? Configurable
We likely want to limit calls to blockstackd as that will be the bottleneck in performance.
Perhaps the resolver is just the process that keeps a DB up to date (mongo?) and we make those features (search, profile resolution, anything requiring additional indexing) live behind a flag
There is a version of this API that is stateless as well were every time a call comes in it makes appropriate calls to blockstackd and then resolves the profile. This would be much lighter weight and not require any storage, but responses would be much slower. I could see usecases for each of these (stateless install for an individual doing dev or supporting his own view of the network, stateful install for applications like blockstack-explorer).
Would we be using the API here: https://core.blockstack.org/ or changing that up? Might want to think about writing an OpenAPI spec for this.
I have a the resolver portion (need to change the name back, that work is in the indexer folder) implemented in a highly parallel, performant manner in go. I understand the desire to keep everything in one language, but considering we already have a 70% complete implementation in another it might be worth at least another set of eyes on the work I've put in: https://github.com/blockstack/go-blockstack/

5. `gaia/hub`

Excited for more work on this component!! Love these ideas here.

jcnelson commented 6 years ago

Doing this as comments is annoying. Why not just have a Github wiki for this sort of thing? It gets us the edit history while giving us lower friction than sending PRs. Replying here makes us have to go find the file we want to comment on, switch back and forth between browser tabs, and copy/paste the section we want to comment on in order to provide context.

That said:

gaia/hub Con of this design: providing local backups for users

I think there's a straightforward solution: make Gaia hubs composible. A Gaia hub can route reads and writes to other Gaia hubs. Then, the user can run a Gaia hub locally and have it replicate to both local disk and to an upstream Gaia hub. The upstream Gaia hub can, in turn, have its own replication policy (example: "send all pictures to Google Drive but keep documents in Dropbox).

kantai commented 6 years ago

Replying to @jackzampolin --

The only question I have is which other piece does this connect to? Just to a blockstackd node right?

blockstack.js will communicate with gaia for storage operations, and probably the consumer api endpoints (resolver and the api), rather than blockstackd directly -- but I could be convinced otherwise on that last point. The rationale for a separate consumer-API is that updating that API can be done relatively frequently, and support a large breadth of versioning, whereas blockstackd should really only change when the underlying protocol changes (or hotfixes).

(blockstack api and resolver) These should live in the same repo methinks.

I'm not sure -- I think people want custom resolvers, but not necessarily custom other stuff. Of course, this is not that strong of a concern, so they could definitely live in the same repo if necessary.

Does this resolver need to run against the entire network every time it runs, or would it resolve the entire network once on startup and then check for new nameops every ~10 minutes? Would we want to set TTL on profiles so they are getting checked every ~15-30 min? Configurable

I think for setting TTLs we should use the information provided by the standards in place -- our zonefile format sets a TTL for the zonefile entry. And the profiles are themselves fetched over HTTP, which has a cache header. We should use those when determining when to store data locally on the resolver.

There is a version of this API that is stateless as well were every time a call comes in it makes appropriate calls to blockstackd and then resolves the profile.

I strongly prefer starting from a stateless system and seeing how far the performance can be improved by using existing caching tooling. The problem with building up a local index is that it is equivalent to caching, but we'd then be in the business of trying to make sure that we're updating the local state correctly and in a timely manner. If we use existing caching tools (like vanquish, nginx, CDNs), then we don't have to worry about ensuring the correctness of state-transitions (like name transfers, zonefile updates, name expirations, new registrations), which always end up being more complex than we imagine.

Would we be using the API here: https://core.blockstack.org/ or changing that up? Might want to think about writing an OpenAPI spec for this.

It's probably a good starting point. Trim it down and then call it /v1/ forever. Also, we have a Blueprint API Spec (https://github.com/apiaryio/api-blueprint/) for it already -- we can convert that to OpenAPI if you want (though, that would need to update the code to generate https://blockstack.github.io/blockstack-core/ and core.blockstack.org)

I have the resolver portion (need to change the name back, that work is in the indexer folder) implemented in a highly parallel, performant manner in go. I understand the desire to keep everything in one language, but considering we already have a 70% complete implementation in another it might be worth at least another set of eyes on the work I've put in: https://github.com/blockstack/go-blockstack/

Yeah -- I'm fine with implementing the API or the resolver in Go, and what you started is a good starting point. I think the important decisions here are (1) whether or not to separate the resolver from the API and (2) whether or not it should have local state that its managing. I would argue that managing local state should be avoided unless absolutely necessary, because it will ultimately involve re-implementing state-transition logic embedded within blockstackd, and that is a bad path to go down.

larrysalibra commented 6 years ago

@kantai This is really great. Thanks for writing this up!

blockstack.js is missing from the list of components.

I'm not really sure what roles 3. blockstack-api or 4. blockstack-resolver play that can't be done client side.

We resolve zone file hashes to profile files in the browser already. Namespace specific behavior seems like a per app problem to solve and out of scope for us.

Versioning of gaiahub and blockstackd interfaces should be their own responsibly -> having said that, both services are super simple so we shouldn't have to change them often.

I think it's important that we remember that identity and authentication of particular applications of blockstackd & gaia storage. An identity search service should live on a higher layer and generate its index based on information in reads from blockstackd and gaia hubs. There will be other search services for other things.

I took a stab at (poorly) drawing how I envision these components interacting:

Let me know if I'm terribly off-base compared to everyone else's understanding.

jcnelson commented 6 years ago

I do not understand the difference between the resolver API and the Blockstack API. My understanding of the target architecture is:

blockstack.js: Reference client
- Implements stub API methods for the rest of the system
- Implements stub API for contacting UTXO providers for sending/broadcasting transactions
- Implements stub API for blockstackd
- Implements API for transaction construction and signing
- Implements the wallet
- Implements the Gaia client
  - Signs, verifies, encrypts, and decrypts files and file indexes
  - Resolves conflicts between writes across a user's devices
blockstackd: The Blockstack blockchain reference implementation (moral equivalent to bitcoind)
- implements name database (or a shard of it)
- implements Atlas peer for its name database
- implements read-only API for querying the name database and Atlas database
Blockstack API: The resolver and registrar
- Implements a read-only API to read and search for profiles
  - It must be easy to extend with app-specific name resolution rules
- implements the same stub APIs in blockstack.js for super-light clients that for whatever reason cannot use blockstack.js (it is fine if this API simply uses blockstack.js internally to service these requests)
Gaia Hub: Read/Write proxy to storage services
- Implements get/put API for shuttling data between upstream storage and downstream clients
  - Storage API calls may require challenge/response authentication
  - Storage API treats data as opaque blobs by default. It does not care about conflicts
- Implements storage provider drivers
  - Including a storage driver for forwarding requests to upstream Gaia hubs
- Implements configuration API for getting/setting drivers, authentication policies, and replication policies.

larrysalibra commented 6 years ago

After a some out of band discussion with @kantai: the reason we need a resolver component is that resolving subdomains requires parsing the parent domain's zone file and making a number of requests to generate the state of the subdomains. This would impractical to do client side because of the resource intensity.

The way I look at blockstackd from an app/user/developer perspective is that blockstackd's main role is to give me a zone file when I give it a name. I propose we move the subdomain indexing functionality into blockstackd so that it can maintain the index of subdomains and return zone files for subdomains without having to use a separate component.

Other than that, my understanding is generally the same as @jcnelson's, except that I think we can remove blockstack-api & resolver components.

jackzampolin commented 6 years ago

The way I look at blockstackd from an app/user/developer perspective is that blockstackd's main role is to give me a zone file when I give it a name. I propose we move the subdomain indexing functionality into blockstackd so that it can maintain the index of subdomains and return zone files for subdomains without having to use a separate component.

+1000

yknl commented 6 years ago

Is it possible to combine Blockstackd, Blockstack API and the resolver? Would there be any reason for anyone to only install blockstackd and not want the resolver and API? And if any of the functionality can be moved to blockstack.js we should.

kantai commented 6 years ago

The way I look at blockstackd from an app/user/developer perspective is that blockstackd's main role is to give me a zone file when I give it a name.

What about the historic operations for a name? Names owned by an address?

I propose we move the subdomain indexing functionality into blockstackd so that it can maintain the index of subdomains and return zone files for subdomains without having to use a separate component.

Yes, that's fine. If someone wants custom subdomain resolution, they can always implement that separately.

Is it possible to combine Blockstackd, Blockstack API and the resolver? Would there be any reason for anyone to only install blockstackd and not want the resolver and API?

I think this depends on how much of the functionality of the resolver and api moves to the client. If we can move almost all of it to the client, then blockstackd is really simple. The idea behind separating them, is that if the api has support for complex queries (number of names in a namespace, number of names on a given blockchain, name history), the demand to change those queries will be high, and blockstackd should ideally be small and updated very infrequently.

jackzampolin commented 6 years ago

I guess the way I'm thinking about blockstackd vs blockstack-api is that the API builds indexes of data that exists in blockstackd.

larrysalibra commented 6 years ago

What about the historic operations for a name? Names owned by an address?

Yes. Names, addresses, zone files, operations. These are the domain of blockstackd. The point I was trying to make is that I think blockstackds scope should end at a zone files. Anything higher level, profiles, etc can be somewhere else.

The idea behind separating them, is that if the api has support for complex queries (number of names in a namespace, number of names on a given blockchain, name history), the demand to change those queries will be high, and blockstackd should ideally be small and updated very infrequently.

Ahh okay. Now I see the thinking behind separating this out.

@kantai is this where you were thinking blockstack api would sit? between blockstackd and the outside world?

If that's the case, I like how a simple api would make blockstackd and our consensus code a lot more approachable by outside developers.

kantai commented 6 years ago

@kantai is this where you were thinking blockstack api would sit? between blockstackd and the outside world?

This is exactly how I imagine this.

larrysalibra commented 6 years ago

Awesome. Sounds like we're nearing consensus.

yknl commented 6 years ago

Maybe we can have Blockstack API be an installable service of blockstackd like insightAPI is to bitcore-node

jackzampolin commented 6 years ago

Please no. That can be the way it works in practice, but building them as decoupled services will help us debug issues and make for a more robust and deployable product. We can make easy deployment scripts to spin everything up together, or users can scale each component separately (connect each API to a number of backend blockstackd instances for better performance). We don't need to have an opinion their deployment strategy.

larrysalibra commented 6 years ago

Maybe we can have Blockstack API be an installable service of blockstackd like insightAPI is to bitcore-node

I like logically separating blockstackd and blockstack api and having a very simple api on blockstackd that blockstack api talks to. I'm wondering if we shouldn't always distribute them together. Is there any instance where someone would want to run blockstackd without blockstock api?

kantai commented 6 years ago

That can be the way it works in practice, but building them as decoupled services will help us debug issues and make for a more robust and deployable product. We can make easy deployment scripts to spin everything up together, or users can scale each component separately (connect each API to a number of backend blockstackd instances for better performance). We don't need to have an opinion their deployment strategy.

Agreed -- they should be developed as decoupled services. Deployment scripts can take care of running them coupled.

I like logically separating blockstackd and blockstack api and having a very simple api on blockstackd that blockstack api talks to. I'm wondering if we shouldn't always distribute them together. Is there any instance where someone would want to run blockstackd without blockstock api?

I don't think there's instances where someone would want to run blockstackd with blockstack-api, but I'm with @jackzampolin that they should be able to be run separately, which avoids us making people's deployment/scaling decisions for them. We should probably target our default deployment scripts/documentation at running them together though (this decision can also be revisited down the line).

larrysalibra commented 6 years ago

I'm with @jackzampolin that they should be able to be run separately, which avoids us making people's deployment/scaling decisions for them.

Makes sense to me.

jcnelson commented 6 years ago

It sounds to me like the Blockstack API is just a service for querying profiles. If you don't care about profiles, then you don't need the Blockstack API service.

That said, blockstackd is useful by itself. I use it all the time to check to see if transactions go through, and to query name history. It's actually pretty lightweight--you could run it on your home router.

I think of blockstackd as something akin to a DNS/CA server, and the Blockstack API service as something akin to a Web server. Naming and PKI are wholly separate concerns from app data hosting and profile indexing.

jackzampolin commented 6 years ago

This is sounding a lot like a consensus!!!

yknl commented 6 years ago

We can make easy deployment scripts to spin everything up together, or users can scale each component separately (connect each API to a number of backend blockstackd instances for better performance). We don't need to have an opinion their deployment strategy.

Do we foresee the need to be able to scale a blockstackd and API to a number of instances? What if by default you can easily install the API as a service from blockstackd and give the option to for standalone API instances to connect to blockstackd.

kantai commented 6 years ago

That said, blockstackd is useful by itself. I use it all the time to check to see if transactions go through, and to query name history. It's actually pretty lightweight--you could run it on your home router.

@jcnelson -- the idea behind interposing blockstack-api between clients and blockstackd is that blockstackd should be a minimal codebase that receives minimal amounts of updates. When clients interact directly with an API, that creates pressure to make updates and changes to that API, and we want to minimize the amount of updates to blockstackd -- ideally that's just consensus-breaking changes and hotfixes for bugs (not additional API features).

jcnelson commented 6 years ago

I like it. blockstack-api is the libc to the kernel that is blockstackd.

larrysalibra commented 6 years ago

It sounds to me like the Blockstack API is just a service for querying profiles. If you don't care about profiles, then you don't need the Blockstack API service.

I don't think blockstack api should have anything to do with profiles. profiles and identity are an application on top of the naming system. (see the pink line in my ugly drawing below)

Profile look up can be done in blockstack.js.

I don't think we should encourage clients in general to interact directly with blockstackd. The only client should of blockstackd should be 1 or more blockstack-api instances.

jcnelson commented 6 years ago

Then I do not understand what the blockstack-api provides besides application compatibility with different blockstackd versions?

jackzampolin commented 6 years ago

Blockstack Api should resolve profiles. Its creating search indexes.

jcnelson commented 6 years ago

Let me ask this another way. If we can keep the blockstackd API stable and agreed-upon, do we need a separate blockstack-api service? If so, what does that service provide that is outside the scope of both blockstack.js and blocksatckd?

jcnelson commented 6 years ago

@jackzampolin @larrysalibra I'm getting conflicting signals. Profile resolution happens in blockstack.js, right?

jcnelson commented 6 years ago

If all blockstack-api does is provide a search index over profiles, can we explicitly narrow this component's scope? Maybe by calling it something more specific, like blockstack-search or blockstack-explorer?

jackzampolin commented 6 years ago

The way I look at it is that you should be able to do all of it in the browser if you want, but a large number of applications will want a server side component to speed up some of the operations that would take a bunch of network hops to complete. To do a profile resolution by just talking to a core node you need to make the following calls:

- RPC get_name_blockchain_record
- RPC get_zonefiles
- fetch profile from zonefile

On a bad network that can add significant latency that would make apps almost unusable under anything but ideal network conditions. We are going to want some sort of server side component to make that data easier to get to, fewer hops and a faster interface.

kantai commented 6 years ago

If all blockstack-api does is provide a search index over profiles, can we explicitly narrow this component's scope? Maybe by calling it something more specific, like blockstack-search or blockstack-explorer?

Okay -- to recap, I think there's still two issues here on blockstack-api --

Should we impose an API on top of blockstackd to interpose on all client requests?
Should profile resolution or search be done by blockstack-api or on a client?

For (1), the reason to interpose on all client requests is exactly what @jcnelson mentioned before -- support for new shiny API features without updating the important kernel (the *nix analogy is libc:kernel)

For (2), we still need to make those decisions.

jcnelson commented 6 years ago

Re (2), I believe search is a separate problem from indexing. I don't need a search index to do lookups, just like how I don't need Google to run DNS queries.

@jackzampolin the first two calls (get_name_blockchain_record and get_zonefiles) can be routed to a local blockstackd. I do this on my laptop, for example. This is both faster and more secure than trusting a 3rd party instance to do this on my behalf, since the profile data is authenticated with the blockchain and zone file data.

kantai commented 6 years ago

Recapping a consensus forming meeting off GitHub. We formed a consensus around a lightweight shim API as diagrammed in Larry's last component drawing. And I believe we are going with implementing the resolver logic in the client (blockstack.js).

On Dec 7, 2017 6:19 PM, "Jude Nelson" notifications@github.com wrote:

Re (2), I believe search is a separate problem from indexing. I don't need a search index to do lookups, just like how I don't need Google to run DNS queries.

@jackzampolin https://github.com/jackzampolin the first two calls ( get_name_blockchain_record and get_zonefiles) can be routed to a local blockstackd. I do this on my laptop, for example. This is both faster and more secure than trusting a 3rd party instance to do this on my behalf, since the profile data is authenticated with the blockchain and zone file data.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/blockstack/blockstack/issues/376#issuecomment-350124887, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGaEAtQcaqaC9p8AkzjBIPya2azTEX7ks5s-HKXgaJpZM4Q6AS6 .

jackzampolin commented 6 years ago

This discussion has concluded and we are making the changes described here. Closing.

stacks-network / stacks