open-services / open-registry

Community Owned JavaScript Registry
https://open-registry.dev
MIT License
267 stars 8 forks source link

Federation #19

Open victorb opened 5 years ago

victorb commented 5 years ago

I opened a preliminary PR (https://github.com/open-services/open-registry/pull/10) for Federation but probably best to go via a issue first, to better enable discussions around it. Here is what I've been thinking so far.

Old proposal: https://gist.github.com/victorb/82ace9e6fe7adf578833527b8b94f914

New proposal:

Open-Registry Federation

Summary

Open-Registry as a crowdfunded registry won't be able to reach the same scale of npm inc registry without raising significant amount of funds. What we can do however, is setup a federation of registries which would significantly lower our operating costs and also give the users the benefit of faster performance and local resource sharing.

The model of federation proposed here will decentralize the storage and transfer of tarballs first, as it poses an easier way of getting started with federation for Open-Registry.

Once implemented and used, we can start focusing on research about federated publishing as well.

Motivation

Constraints

Use Cases

Security

Practical steps

Ok, so the working plan is the following:

This is the small, MVP version to ensure the idea is viable in the wild.

First step towards federation is having the metadata index centralized with Open-Registry while tarballs can be served from anywhere and anyone.

Plan is to use ipfs-lite by @hsanjuan to start a embedded libp2p node that will expose the traditional registry interface as HTTP endpoints.

The software will connect to the central registry to find out the latest root hash and also listen for any changes, automatically update it's local pointer when Open-Registry's pointer changes.

The root hash can be found in multiple different ways, depending on the environment of the software.

The software will basically be a resolver for (packageName, packageVersion) => IPFS hash via it's local proxy.

CLI interface

$ open-registry --federate
                --share
                --update-type=<http|dns|ipns|pubsub>
                --offline

--federate <multiaddr>   - Connect to already running instance and use it's
                           root hash.
                           Default: /dns4/npm.open-registry.dev/tcp/6736

--share                  - Enable other peers to connect to you and download
                           public packages.
                           Default: true

--update-type            - How to get the latest root hash from Open-Registry.
                           Default: http

--offline                - Don't do any connections, use last known root hash.
                           Default: false

Example usage:

$ open-registry
Connecting to npm.open-registry.dev
Getting latest hash via HTTP over TLS
Started sharing downloaded public packages with others
Started HTTP server on http://localhost:6736 # mnemonic: "open" in T9
...
Currently connected to 3 peers
Upload/Download [current/total]: 32kbps/0kbps [3mb/7.3mb]

Pointing your package manager to http://localhost:6736 should now allow you to download and install packages on-demand, while caching them and serving it to other users who are trying to download them too.

Federation Protocol

When the federation software gets started on the users device, it connects to the main registry.

Once connection has been established, it asks for the latest version of the registry (just a pointer), and saves it for future use.

Concurrently, it starts a HTTP server locally.

Now the user can point it's client to the local HTTP server

Requests will be proxied via the latest root hash the federation software knows about, and cache fetched data

When the root hash of the main registry changes, it publishes it via the following ways:

If the local client makes a request for a package that doesn't exists in the local root hash, the client needs to make a request to the central registry to download the package. After this is done, the package will be included in the new root hash, and can therefore be downloaded by the local client without any requests to the central registry.

Simulator

First step of the federation setup is creating a suitable testing environment where we can run tests about how well the federation is working.

Simulator should start with running the following scenarios:

More elaborate schemes can be created in the future.

Bootstrap nodes

Open-Registry will run a couple of bootstrap nodes. These are responsible for being accessible to the federation nodes and provide the data for metadata and tarballs if the federation nodes doesn't have it locally.

Metrics

Both the bootstrap nodes and the main registry index should publish metrics in the Prometheus format to be collected by the metrics gatherer. These metrics will eventually be made accessible via a public dashboard.

For the federation nodes, we can offer opt-in metrics in the future, so we can see the health of the federation.

Existing infrastructure migration

The current Open-Registry is just one instance which is the main Open-Registry index. With federation, the architecture would change to add another component which would be the federated instances. We have more flexibility on where to place these but are in no rush to add them currently.

Potential Issues

Drawbacks

Alternatives

Unresolved Problems

Future

max-mapper commented 5 years ago

I don't have time to work on this right now, but here's an old thread from a similar initiative I worked on https://github.com/depjs/dep/issues/8

victorb commented 5 years ago

Thanks a lot @maxogden, will check that out.

retrohacker commented 5 years ago

Running a global network of the scale of the npm registry will be impossible to do with just being funded by the community as the costs will be too high.

Wonder how far we can get with cloudflare + cloud storage.

Will be experimenting with this in the coming weeks and will report back :-)

victorb commented 5 years ago

@retrohacker thanks, appreciate it, ping here once you have some results to share :)

That said, I do think that even if we find the fastest CDN, we can make it faster for people by having a federated model. But CDN in front of the metadata registry would still be a good idea.

victorb commented 5 years ago

I've updated the initial issue here with an updated version of the proposed federation, will also bump it on the roadmap.

Old proposal can be found here: https://gist.github.com/victorb/82ace9e6fe7adf578833527b8b94f914

marcusnewton commented 5 years ago

Build it using Holochain, it's exactly what you need for distributed (fully sharded) storage and cryptographic security

retrohacker commented 5 years ago

@victorb as promised, circling back to report on cost.

Self hosting on cloud providers turned out to be reasonable. Our GCP mirror ended up costing ~$300 to do the initial mirroring (pulling 5TB of data through cloud functions and into cloud storage).

Once the files are sitting in storage (multi-zone within the US), the cost is ~$6 a day. That includes the instance that is sitting there watching the CouchDB stream from the npm registry to keep the mirror fresh. The breakdown is $3.46 per day for storage and $2.28 for the compute instance.

Cloudflare functions (where we are doing our load balancing) costs $0.50 per million invocations.

BTW the service is up and running if you want to give it a try: https://freajs.com