rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.65k stars 2.4k forks source link

Additional Ways to Pass Registry Auth Token to Cargo at Runtime #11136

Open jonathanstrong opened 2 years ago

jonathanstrong commented 2 years ago

Problem

While testing out the RFC 3139 implementation for authenticated crate downloads (PR), I came across a small sticking point: it is currently (relatively) difficult to pass a registry auth token to cargo at runtime for commands other than publish.

As you may know, cargo publish provides a --token option for passing an auth token at runtime. Previously, this was the only subcommand where that made sense. Now that we are introducing authenticated crate downloads, the existing methods of providing an auth token at runtime for other commands could use improvement.

cargo login and credentials stored on local file system works great for terminal usage by a person, including for non-publish subcommands. But for containers, CI, sandboxed build environments, etc. there are often situations where you don't want to leave an auth token stored in a local file, which presents the need to pass the token to cargo at runtime.

My interest in this is, for example, for the purpose of auto-generating cargo doc output for crate versions published at a private registry that requires authentication, and wanting to avoid storing the registry auth token in a file (i.e. ~/.cargo/credentials) in the build environment.

The only option I have found for passing the token for a private registry for cargo fetch, etc. is the CARGO_REGISTRIES_<name>_TOKEN env var, which I was able to get to work.

However, it took some effort to discover how to format the <name> portion of the env var. In my case, I was testing using a private registry configured under the name "shipyard-rs". The hyphen/dash in the name made things difficult. My first attempt was CARGO_REGISTRIES_shipyard-rs_TOKEN=xxx and that did not work, as variable names with dashes don't work in shell. I tried using the env program (env "CARGO_REGISTRIES_shipyard-rs_TOKEN=xxx" cargo ... before realizing that formatting the registry in screaming snake case would be converted and matched (i.e. CARGO_REGISTRIES_SHIPYARD_RS_TOKEN=xxx works).

I may have missed where it is, but when I was trying this out I didn't find any documentation about the formatting of the <name> portion of the env var.

For cargo fetch et al., there is no --token command line option, so that can't be used.

I was also unable to use --config to work around this, i.e.:

$ cargo fetch -Z registry-auth --config 'registries.shipyard-rs.token=xxx' 

failed with error: registries.shipyard-rs.token cannot be set through --config for security reasons (is there some other way to pass this which would not trigger this error?)

To be clear, this situation was never a problem when publish was the only subcommand that needed auth tokens. But now that we are moving forward with fully authenticated cargo, I think the status quo on passing a token at runtime is not at the high level I would expect from Cargo.

Proposed Solution

Notes

This issue was initially raised in the issue for the RFC implementation PR, but @Eh2406 suggested creating a separate issue on the matter.

arlosi commented 1 year ago

Providing a --token command line argument is problematic since it may be necessary to specify tokens for multiple registries. For example, cargo build could need to pull down crates from multiple registries.

Setting the token via --config has similar security implications as the existing --token option does, so I'm not sure why one is allowed but the other is not. Unless we're planning on deprecating --token.

Do you have a suggestion for how the docs for the environment variable based approach could be improved?

ehuss commented 1 year ago

Unless we're planning on deprecating --token.

We may consider deprecating --token in the far future. We are trying to move away from passing secrets on the command-line, due to the relatively high risk of leaking a secret.

jonathanstrong commented 1 year ago

regarding --token and potential future deprecation: without diving headlong into a larger discussion, there are a number of aspects to the cargo authentication story that make it pretty difficult to manage. I think the experience for the user is better than for the registry server operator, but not by a lot. one of the biggest issues is, there's two authentication "venues" -- auth for cloning/pulling the git crate index, and then auth for cargo http requests. git/ssh authentication is pretty fragile, and there are magic tricks like git-fetch-with-cli you have to discover. if you make it that far you have to (currently) contend with no auth token sent along with download requests, which happily is getting fixed. if I could play God and re-design this from scratch, I'd cut out the git repo (instead ask the server provide updated crate versions, like the "sparse" rfc), condense the "api" and "dl" endpoints to one, ditch the dichotomy of "registry" name vs "index" url so there's only one true way to refer to a registry, use private key + signed requests for auth, and always sign every request, no matter what.

@arlosi when I just did a search to review what you will find on the subject and came across this example, which seems like it is pretty well-placed:

CARGO_REGISTRIES_MY_REGISTRY_INDEX=https://my-intranet:8080/git/index

Not sure how I missed that before.

I do think it's a tad complicated that you have to transform the registry name into screaming snake case, and I wonder how robust that is for all the edge cases.

It also highlights a point of uncertainty between the "registry" name and "index" url. This generally works based on people using the same name as the "registry" in Cargo.toml, but what happens if people use different names, which they are using to point to the same index?

arlosi commented 1 year ago

I've updated the error message in #10592 to include a line suggesting which environment variable to set. Hopefully that helps with discoverability.

For example:

[UPDATING] `alternative` index
[ERROR] failed to download `bar v0.0.1 (registry `alternative`)`

Caused by:
  no token found for `alternative`, please run `cargo login --registry alternative`
  or use environment variable CARGO_REGISTRIES_ALTERNATIVE_TOKEN
jonathanstrong commented 1 year ago

I just had a chance to look at this again, now with the benefit of the -Z registry-auth feature. specifically, it was in the context of changing our rustdoc builder to use -Z registry-auth instead of our previous, hacky non-standard solution of passing the auth token via the user-agent header.

my primary purpose of writing this comment is to thank @arlosi for having the foresight to include this code, which warns the user if an alternative registry's name has been changed via env vars:

    // If the name doesn't match, leave a note to help the user understand
    // the potentially confusing situation.

that message saved me hours, I expect!

the context was building rustdocs for crate versions published by users for a private registry. like docs.rs this is done in a sandboxed environment (via rustwide crate) where network is not permitted during cargo doc execution (i.e. by the build.rs scripts).

it may sound somewhat absurd, but in this situation you don't necessarily know what the "name" of the alternative registry is, because the user could have used any name when configuring their registry in ~/.cargo/config.toml. Also, relying on name alone is somewhat problematic (in my mind, and this is more based on instinct) in creating surface area for name ambiguity errors/security bugs/etc. when compared to the url of a specific git repo/crate index - which 1) requires a ssh key to read, and 2) is fixed and under our control as server administrator.

due to the constraints, it's a bit challenging, and I did not have a good sense of how cargo maps registry name to the crate index url internally - whether it is many-to-one or one-to-one. I had read a while back about "canonical url" in a RFC so I knew there had been some thought about the issue.

one source of confusion/difficulty was that the mapping between a registry name and the screaming snake case version of it is lossy, so my initial attempts did not line up as I expected. for example, I think I expected a env var like CARGO_REGISTRIES_SHIPYARD_RS_INDEX to refer to a registry named "shipyard-rs" (with a hyphen), but it was converted to "shipyard_rs" (with an underscore instead).

the other thing that tripped me up (I was saved by another excellently-placed error message) was accidentally including lower case characters in the env var name.

I was using the account uuid for a temporary registry "name", and the lower-hex encoding resulted in an env var name that was not all uppercase, and thus rejected by cargo. i.e. an env var like CARGO_REGISTRIES_SHIPYARD_8ad420b85dbd4631be4da6b04da87250_INDEX is rejected by cargo due to lowercase characters.

once I figured out both the underscore/hyphen and casing issues I was able to override the registry name <=> index mapping (via CARGO_REGISTRIES_<name>_INDEX env var) and then provide an auth token for the (renamed) registry using the CARGO_REGISTRIES_<name>_TOKEN env var) and it worked as expected. by using those env vars, I was able to specify a temporary name to refer to a specific registry with a given index url (e.g. ssh://git@shipyard.rs/<org-name>/crate-index.git) and then provide an auth token for that registry specifically.

I will say, however, that the current method of using the name of the env var to specify the registry is less than ideal and prone to confusion and unexpected outcomes. also it's a bit messy that you use two separate env vars, but they are somewhat tied together -- e.g. you can't set the _TOKEN env var before the _INDEX env var or cargo won't know about a registry with that name yet.

the thing that I was really pining for as I was in the midst of this was something like

CARGO_SET_REGISTRY='{ name = "my-registry", index = "ssh://...", token = "abc..", ssh-key = "3Bln.." }'

where you could specify these things using a less error-prone "protocol", if you will. I'm not sure if there are any places that cargo does something along those lines, but if so, perhaps they could be used as a pattern of passing several fields of data in one env var without needing to wonder about how the env var name will be parsed.

I'm sure that the example above wouldn't work as is (might need to permit a list, for example), but hopefully it might help start a discussion about a better way to pass this data to cargo via env vars.

arlosi commented 1 year ago

Thanks for the feedback @jonathanstrong. It's exciting to hear you're using the feature!

Dealing with the situation of not knowing the registry name was one of the most difficult aspects of the -Z registry-auth

You did discover the solution I intended (using the _INDEX and _TOKEN pair of variables). However, I recognize that it's not ideal.

Another direction I considered was using a different mechanism for injecting the registry information into Cargo's config. The environment variable system is problematic for a couple of reasons:

Other existing options include:

We could also consider a system for injecting config via TOML in an environment variable (which is closer to what you suggested). This would probably need an RFC.

CARGO_CONFIG="[registries]
my-registry = { index = '...', token = '...' }"
jonathanstrong commented 1 year ago

The --config command line option combined with -Zconfig-include to specify a file --config config.toml

that would be really powerful! the only thing I wonder about is whether it's possible to use that approach to specify the token as well (vs. needing to use the env var for that). not sure how that works since tokens are normally in a separate credentials.toml file, also I previously ran into case where using config passed via command line flag does not allow credentials.

I feel like this discussion is even more pertinent with asymmetric tokens on the horizon -- which I expect will require specifying additional fields for the registry config.