Open victorb opened 5 years ago
I don't have time to work on this right now, but here's an old thread from a similar initiative I worked on https://github.com/depjs/dep/issues/8
Thanks a lot @maxogden, will check that out.
Running a global network of the scale of the npm registry will be impossible to do with just being funded by the community as the costs will be too high.
Wonder how far we can get with cloudflare + cloud storage.
Will be experimenting with this in the coming weeks and will report back :-)
@retrohacker thanks, appreciate it, ping here once you have some results to share :)
That said, I do think that even if we find the fastest CDN, we can make it faster for people by having a federated model. But CDN in front of the metadata registry would still be a good idea.
I've updated the initial issue here with an updated version of the proposed federation, will also bump it on the roadmap.
Old proposal can be found here: https://gist.github.com/victorb/82ace9e6fe7adf578833527b8b94f914
Build it using Holochain, it's exactly what you need for distributed (fully sharded) storage and cryptographic security
@victorb as promised, circling back to report on cost.
Self hosting on cloud providers turned out to be reasonable. Our GCP mirror ended up costing ~$300 to do the initial mirroring (pulling 5TB of data through cloud functions and into cloud storage).
Once the files are sitting in storage (multi-zone within the US), the cost is ~$6 a day. That includes the instance that is sitting there watching the CouchDB stream from the npm registry to keep the mirror fresh. The breakdown is $3.46 per day for storage and $2.28 for the compute instance.
Cloudflare functions (where we are doing our load balancing) costs $0.50 per million invocations.
BTW the service is up and running if you want to give it a try: https://freajs.com
I opened a preliminary PR (https://github.com/open-services/open-registry/pull/10) for Federation but probably best to go via a issue first, to better enable discussions around it. Here is what I've been thinking so far.
Old proposal: https://gist.github.com/victorb/82ace9e6fe7adf578833527b8b94f914
New proposal:
Open-Registry Federation
Summary
Open-Registry as a crowdfunded registry won't be able to reach the same scale of npm inc registry without raising significant amount of funds. What we can do however, is setup a federation of registries which would significantly lower our operating costs and also give the users the benefit of faster performance and local resource sharing.
The model of federation proposed here will decentralize the storage and transfer of tarballs first, as it poses an easier way of getting started with federation for Open-Registry.
Once implemented and used, we can start focusing on research about federated publishing as well.
Motivation
Constraints
Use Cases
Security
Practical steps
Ok, so the working plan is the following:
This is the small, MVP version to ensure the idea is viable in the wild.
First step towards federation is having the metadata index centralized with Open-Registry while tarballs can be served from anywhere and anyone.
Plan is to use ipfs-lite by @hsanjuan to start a embedded libp2p node that will expose the traditional registry interface as HTTP endpoints.
The software will connect to the central registry to find out the latest root hash and also listen for any changes, automatically update it's local pointer when Open-Registry's pointer changes.
The root hash can be found in multiple different ways, depending on the environment of the software.
The software will basically be a resolver for (packageName, packageVersion) => IPFS hash via it's local proxy.
CLI interface
Example usage:
Pointing your package manager to
http://localhost:6736
should now allow you to download and install packages on-demand, while caching them and serving it to other users who are trying to download them too.Federation Protocol
When the federation software gets started on the users device, it connects to the main registry.
Once connection has been established, it asks for the latest version of the registry (just a pointer), and saves it for future use.
Concurrently, it starts a HTTP server locally.
Now the user can point it's client to the local HTTP server
Requests will be proxied via the latest root hash the federation software knows about, and cache fetched data
When the root hash of the main registry changes, it publishes it via the following ways:
hash
in response to a GET request to npm.open-registry.devnpm.open-registry.dev
on the used libp2p networkIf the local client makes a request for a package that doesn't exists in the local root hash, the client needs to make a request to the central registry to download the package. After this is done, the package will be included in the new root hash, and can therefore be downloaded by the local client without any requests to the central registry.
Simulator
First step of the federation setup is creating a suitable testing environment where we can run tests about how well the federation is working.
Simulator should start with running the following scenarios:
More elaborate schemes can be created in the future.
Bootstrap nodes
Open-Registry will run a couple of bootstrap nodes. These are responsible for being accessible to the federation nodes and provide the data for metadata and tarballs if the federation nodes doesn't have it locally.
Metrics
Both the bootstrap nodes and the main registry index should publish metrics in the Prometheus format to be collected by the metrics gatherer. These metrics will eventually be made accessible via a public dashboard.
For the federation nodes, we can offer opt-in metrics in the future, so we can see the health of the federation.
Existing infrastructure migration
The current Open-Registry is just one instance which is the main Open-Registry index. With federation, the architecture would change to add another component which would be the federated instances. We have more flexibility on where to place these but are in no rush to add them currently.
Potential Issues
Drawbacks
Alternatives
Unresolved Problems
Future
class-is
directly in thepackage.json
and lockfiles/registry.npmjs.org/class-is
instead. More verbose, but more accurate and flexible