rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.71k stars 2.41k forks source link

Support non-crates.io registries #3917

Closed alexcrichton closed 7 years ago

alexcrichton commented 7 years ago

I thought we had an issue for this already, but apparently not! This is intended to help track Cargo's support for custom registries, or registries other than crates.io. Note that this is distinct from mirrors which are intended to be implemented with source replacement.

Some features I think we'll need to support are:

I don't have many thoughts about concrete syntax and such right now. We'll also need a server to actually support this (crates.io as-is isn't quite suitable). There's not a crates.io issue for this currently, but as the requirements here evolve over time I figure we can create one that's more targeted.

Blacktiger commented 7 years ago

Wouldn't the simplest thing be to just allow the user to override CRATES_IO in .cargo/config somehow?

alexcrichton commented 7 years ago

The use case of .cargo/config and Cargo.toml is quite distinct and I don't think this support would want to go into .cargo/config. That style of configuration typically isn't checked into a project whereas information like a custom registry here is critical to building a project and would therefore go into Cargo.toml.

This is also a feature where you don't want to override crates.io as crates may still depend on crates from crates.io. This is specifically targeted at situations such as private registries inside companies and such.

Blacktiger commented 7 years ago

I was thinking of ~/.cargo/config as many organizations would want to use an organization-specific repository for all projects (a proxy could easily prevent directly downloading dependencies even for code written by someone else). That way you can set up your repository settings globally. If you force people to put it into Cargo.toml then they will have to do it for every project, even if it's only being used for the company proxy. Also, if for some reason they take a laptop with them and are outside the company proxy, they may need to disable the repository settings temporarily. Ideally, you want people to be able to set it globally or per-project.

alexcrichton commented 7 years ago

Sounds reasonable to me!

stusmall commented 7 years ago

How would we set which crates instance to publish to? It would be a nightmare situation for me to accidentally publish company IP to the wrong instance.

alexcrichton commented 7 years ago

A good question! That's probably part of the design to consider :)

(also a strong case for encoding it in cargo.toml)

Nemo157 commented 7 years ago

I can see this as covering three highly related features that would be useful individually, or in combination, for different use cases.

  1. Specifying an alternate registry to pull a dependency from.
  2. Specifying an alternate registry to push a library to.
  3. Overriding a registry with a different registry.

1 could be imagined as being similar to the current dependencies.*.git or dependencies.*.path keys, it's just telling Cargo the location of where it can attempt to find the named dependency. This makes most sense to specify in Cargo.toml; maybe a new key like dependencies.*.registry with either a named registry specified somewhere else, or a url specifying the index location, or something similar. Each dependency would have to have a single value for either git, path or registry, with it defaulting to the crates.io registry if none are specified like today.

2 would be just for what @stusmall mentioned, setting the registry to which cargo publish would push to by default. This again would make the most sense in Cargo.toml, probably under a key like package.registry with the same sort of value as used in 1.

1 and 2 together I imagine would be mostly useful for smaller organisations, or small teams that are introducing Rust into a larger organisation, as it would allow them to standup a small internal registry for their private libraries while still pulling the majority from crates.io.

3 would allow a more restrictive organisation to force all third-party packages through a local mirror so they can do things like validate licenses; or – for a more optimistic reading – allow a network limited organisation to have a simple local caching mirror. This would probably make sense to be specified in .cargo/config as you would want it to apply for all your projects and to all dependencies transitively.

carols10cents commented 7 years ago

There's a bunch of different use cases that I've been thinking about, some of this overlaps with what others have already said:

1) A read-only mirror of what crates.io has, or the subset of what crates.io has that is actually used, for redundancy in case crates.io is down or in order to enable builds that don't use the internet. You might even want to be able to specify multiple sources for cargo to try in order to automatically fall back to another if one is down. 2) An internal crates.io instance running inside a network for publishing private crates to and sharing them internally 3) A combination of 1) and 2), so a server that is a superset of what crates.io has 4) Installing some crates from crates.io and other crates from other servers 5) A proxy that sits in front of 4), so it's 1) but has the ability to cache crates from multiple servers to support one source

The relevant prior art that I'm familiar with is Ruby. I'm not advocating that we do exactly what Ruby does, because Ruby has their own historical reasons for doing what they do, but this is one way that this whole process works :)

Applications that use gems as dependencies

Bundler is Ruby's package manager that's mostly in charge of installing dependencies.

Applications that use bundler have a Gemfile for specifying their dependencies written in Ruby. Bundler supports these use cases and differs from cargo in these ways:

source "https://rubygems.org"

Which could be an internal server that all gems should always be installed from instead. You can specify multiple global sources, and they're searched for gems in the source priority order.

Or you can specify that certain gems come from certain sources by passing a :source option or putting gems within a source block:

gem 'my_gem', '1.0', :source => 'https://gems.example.com'

source 'https://gems.example.com' do
  gem 'another_gem', '1.2.1'
  gem 'yet_another_gem', '1.0'
end

Bundler will search for child dependencies of gems coming from sources other than the global source by first looking in the source selected for the parent, but if the dependencies are not found there, it will fall back on global sources using the source priority order

If the source URLs need authentication, you can either put them in plaintext in the source URL in the Gemfile, (source "https://user:password@gems.example.com"), or you can run bundle config https://gems.example.com/ user:password to store the username/password in a local .bundle/config.

Libraries that are published (and may also use dependencies)

Publishing gems is managed by rubygems. Bundler can hook into rubygems and manage some of this too, but rubygems predates bundler.

Gems that are libraries have a gemname.gemspec file where the gem's dependencies and metadata are specified.

Rubygems has a metadata value allowed_push_host to restrict gem pushes to a single host and prevent accidental pushes to rubygems.org:

Gem::Specification.new 'my_gem', '1.0' do |s|
  # ...
  s.metadata['allowed_push_host'] = 'https://gems.my-company.example'
end

Specifying a dependency in a gemspec file looks like:

Gem::Specification.new 'my_gem', '1.0' do |s|
  s.add_runtime_dependency 'example', '~> 1.1'
end

Typically, libraries also have a Gemfile that just points to the gemspec to be able to use bundler but not duplicate information:

source 'https://rubygems.org'

gemspec

There's a way to have a library temporarily depend on a library from a custom source, by overriding a gem specified in gemspec with something specified in the Gemfile, where the source option is supported. This is not intended to be published.

Blacktiger commented 7 years ago

In Maven you have the ability to specify multiple repositories and the username/password for connecting to those repositories. If you give people the ability to setup secured access to a repository you should also consider giving them a way to store the password hashed in a separate file that they can restrict access to for security.

One other thing, which isn't directly related to this issue but is worth thinking about, is that currently cargo is just a "flat list" of libraries. This can make it difficult for users to find what they want as they can only look at search results. Many other formats provide additional ways to scope things. NPM for example now includes a scope option, Nuget recommends using dot-separated names as a scope and maven uses group id, name and version. This becomes especially useful in large organizations where different teams might have different scopes. I can already see some crates that would benefit from some organization such as the piston libraries.

carols10cents commented 7 years ago

cargo is just a "flat list" of libraries

You mean crates.io, right?

One other thing, which isn't directly related to this issue but is worth thinking about ... I can already see some crates that would benefit from some organization such as the piston libraries.

I think https://github.com/rust-lang/crates.io/issues/409 is the issue you're looking for. It's implemented for users right now but not organizations yet, so right now you can go to, for example, https://crates.io/users/carols10cents to see all the crates I've published. Once it's implemented for organizations, you'll be able to see all the crates that the piston team has published.

Blacktiger commented 7 years ago

Maybe I'm wrong, but it looks to me like that assumes the crate is coming from github which will not be the case for large organizations using cargo with a repository manager.

carols10cents commented 7 years ago

that assumes the crate is coming from github

What assumes the crate is coming from github?

Ah, are you talking about how right now crates.io only allows authentication with a github account? Which is somewhat related to this, but more on the crates.io side, not cargo.

Blacktiger commented 7 years ago

Nevermind, I guess I misunderstood the code changes.

carols10cents commented 7 years ago

Doing some more research, npm Enterprise is the same codebase as npmjs.com and supports a variety of configurations so that one instance can be both a proxy cache and a private registry host.

The npm CLI can then be configured to either support installing ALL packages from a private registry or to only install private packages from the private registry and continue to use npmjs.com for public packages.

Installing all packages from the private registry is done by setting the private registry as the default registry that the npm CLI looks in.

For only installing private packages from the private registry, you log in with a registry and a scope, which tells the npm CLI to look in that registry for packages in that scope, and also that all packages in that scope should be published to that registry:

npm login --registry=http://myreg.mycompany.com:8080 --scope=@myco

This creates a token that gets stored in an .npmrc file like:

@myco:registry=http://myreg.mycompany.com:8080
//myreg.mycompany.com:8080/:_authToken=[token]

Rather than having HTTP Basic Auth in urls like Rubygems, everything with npm looks token based. So you can generate a token for travis and store that in an env var, for example.

Notable is that each time on each machine that you run npm login, a new token gets generated and saved in the registry so that you can later revoke a subset of the tokens.

The npm CLI treats scoped packages as private by default, so you cannot accidentally publish a scoped package to the public registry unless you have access to that scope in the public registry.

The npm CLI has a bunch of relevant options, such as always-auth, which forces npm to always require authentication when accessing the registry, even for GET requests; auth-type, which specifies what authentication strategy to use; settings involving where to look for SSL certs.

carols10cents commented 7 years ago

https://github.com/rust-lang/cargo/issues/3365 is related to this, potentially.

carols10cents commented 7 years ago

From twitter: npm is now agnostic about which registry you used to generate the package-lock.json:

Kixunil commented 7 years ago

I'd like to describe how the code is managed in the company I work for, so we could find a solution to cover our use case.

We have global git server which serves repositories. There is a special repository for shared code that is visible to everyone. If someone pushes a breaking change, all builds break. (No semver :( )

I guess having a special server like crates.io wouldn't be possible. I'd love to have benefits of semver and cargo using our repository directly.

What I envision is that when I do breaking change and bump the version, no compilation would break and then I could update projects to new APIs when appropriate.

Also, I should mention that I use crates from crates.io too and I need local registry too.

Anyone has different problems or see some obvious solution how to approach this?

carols10cents commented 7 years ago

Closing this because https://github.com/rust-lang/rfcs/pull/2141 has been accepted and there's now a tracking issue at https://github.com/rust-lang/rust/issues/44931, which replaces this issue.