rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.28k stars 2.32k forks source link

Cargo becomes unusable if gitconfig is malformed #9854

Open portiatrivisonno opened 2 years ago

portiatrivisonno commented 2 years ago

Problem

Yesterday, I had to work the first time with cargo and it made a pretty bad first impression as it was striking due to a seemingly "malformed" gitconfig (see #9853). Thus, I had to constantly rename my ~/.config/git/config every time I issued a cargo or git command. That was insane and very inconvenient to say the least! I still don't know why cargo build claims to require the git config but I'm pretty sure it can do a fine job without it. ;-)

Possible Solutions

I suggest instead of failing when cargo cannot parse the gitconfig, to instead issue a warning, assume an empty gitconfig and continue. If the contents of the gitconfig is really necessary for whatever cargo does, we may try to recover as much usable information from it and use that but aborting and doing nothing is IMHO the worst possible alternative as it prevents using cargo in cases where the contents of gitconfig is not essential.

Notes

Output of cargo version: cargo 1.53.0

ehuss commented 2 years ago

Yea, I can see how that is frustrating. Git is used to inform cargo which files are used in the package, which is used for some things like fingerprinting (change detection). I don't think it would be possible to tell libgit2 to ignore configuration errors. It may be possible to fall back to non-git behavior if cargo fails to load the git repository.

nipunn1313 commented 2 years ago

I poked at this for a moment, and found that it's not actually failing because of the project itself, but rather because it needs to initialize the registry index (which is also a git repo).

➜  cargo RUST_BACKTRACE=1 cargo build
error: failed to get `anyhow` as a dependency of package `cargo v0.57.0 (/Users/nipunn/src/cargo)`

Caused by:
  failed to initialize index git repository

Caused by:
  failed to parse config file: invalid configuration key (in /Users/nipunn/.config/git/config:1); class=Config (7)

I printed out the path to the repo which it was failing to initialize and it's this one

~/.cargo/registry/index/github.com-1ecc6299db9ec823

Unfortunately, I think gitconfig is essential in this case. I be we could augment the error message to be more obvious, perhaps something like

  failed to initialize index git repository (in ~/.cargo/registry/index/github.com-1ecc6299db9ec823)
nipunn1313 commented 2 years ago

@rustbot claim

Claiming to change the error message

portiatrivisonno commented 2 years ago

Thanks @nipunn1313 for trying to solve this issue and tracking it down to the index git repo. However, I – similar to @alexcrichton – don't agree that this is fixed by pull request #9869: cargo still cannot be used if the gitconfig is considered invalid (for whatever reason, whether rightly or wrongly so). So, please reopen this issue.

I'm still not convinced that the contents of the gitconfig is necessary for cargo to function properly and be unable to do its work (in most cases). I haven't looked at the source but at the moment this looks to me like an implementation detail where cargo uses the "git mechanism" to store some data.

Does this really need preferences from the (global) gitconfig? If so, what can I (the user) put in my gitconfig to a) change cargo's behaviour for the index, or b) – more drastically – torpedo cargo's efforts rendering it unusable. If the answer to the above is "no", then we don't need a gitconfig. (If b) is true, we should think about whether this is what we want.)

I understand that you do not want to ignore a failed parse silently that would be bad behaviour. How about about printing a warning/error message along with appealing to the user to report this as a bug but also offering to continue as if the gitconfig was empty?

nipunn1313 commented 2 years ago

Iiuc, you’re hoping for a way for the implementation detail (registry index) of cargo to work without using git.

cargo essentially invokes git init on the registry using libgit2.

I suppose I haven’t checked if there’s a way to invoke libgit2 while asking it not to parse the config. That could be a fallback solution if it exists. There might be other unanticipated consequences to such a strategy.

portiatrivisonno commented 2 years ago

Thanks for comming back to this, @nipunn1313.

Iiuc, you’re hoping for a way for the implementation detail (registry index) of cargo to work without using git.

Not quite, using git's mechanism is fine no need to replace that with something selfmade. If I understand it correctly so far, cargo use git to manage a data store ("the index") and sync it with a server. If that's true, what does cargo need the user's config file for?

I quickly skimmed through the options in man git-config:

Long list of options - advice.* -> I don't think cargo cares - core.* -> maybe? - add.* -> nope - alias.* -> nope - am.* -> I don't think cargo uses that - apply.* -> dito - blame.* -> dito - branch.* -> dito - browser.* -> dito - checkout.* -> I don't think cargo cares - clean.* -> dito - clone.* -> dito - color.* -> nope - column.* -> nope - commit.* -> I don't think cargo cares - commitGraph.* -> dito - credential.* -> I don't think cargo uses that - completion.* -> nope - diff.* -> nope - extensions.* -> I don't think cargo cares - fastimport.* -> dito - feature.* -> dito - fetch.* -> maybe? - format.* -> dito - fsck.* -> dito - gc.* -> I don't think cargo cares - gitcvs.* -> I don't think cargo uses that - gitweb.* -> dito - grep.* -> dito - gpg.* -> dito - gui.* -> dito - guitool.* -> dito - help.* -> dito - http.* -> maybe some of it but I guess cargo has its own proxy settings - i18n.* -> I don't think cargo uses that - imap.* -> dito - index.* -> I don't think cargo cares - init.* -> I don't think cargo cares - instaweb.* -> I don't think cargo uses that - interactive.* -> dito - log.* -> dito - lsrefs.* -> dito - mailinfo.* -> dito - mailmap.* -> dito - maintenance.* -> I don't think cargo cares - man.* -> I don't think cargo uses that - merge.* -> I don't think cargo uses that - mergetool.* -> dito - notes.* -> dito - pack.* -> I don't think cargo cares - pager.* -> I don't think cargo uses that - pretty.* -> dito - protocol.* -> dito - pull.* -> dito - push.* -> dito - rebase.* -> dito - receive.* -> I don't think cargo cares - remote.* -> dito - repack.* -> dito - rerere.* -> I don't think cargo uses that - reset.* -> I don't think cargo cares - sendmail.* -> I don't think cargo uses that - sequence.* -> dito - showBranch.* -> dito - splitIndex.* -> dito - ssh.* -> dito - status.* -> dito - stash.* -> dito - submodule.* -> dito - tag.* -> dito - tar.* -> dito - trace2.* -> dito - transfer.* -> I don't think cargo cares - uploadArchive.* -> dito - uploadPack.* -> dito - url.* -> I don't think cargo uses that - user.*, author*., committer.* -> dito - versionsort.* -> dito - web -> dito - worktree.* -> dito

So again, I'm wondering what config variable does cargo need? (In fact, I even believe that some can impede cargo and should thus be ignored, e.g. url. On the other hand, I don't know which config variables libgit2 supports/understands, so maybe most of these are ignored anyway.)

I suppose I haven’t checked if there’s a way to invoke libgit2 while asking it not to parse the config. That could be a fallback solution if it exists. There might be other unanticipated consequences to such a strategy.

Like more deterministic behaviour as the user cannot tamper with an internal implementation. ;-)

nipunn1313 commented 2 years ago

There is a potential misunderstanding here.

Cargo invokes git (via libgit2). Git automatically parses the config. Cargo doesn’t directly need anything from the config, but it is not clear if it possible to bypass the config parsing. If you can find a way in libgit2, let us know.

Git automatically respects the user’s config, and that’s part of the contract of git.

—Nipunn

On Sat, Sep 4, 2021 at 11:09 AM portiatrivisonno @.***> wrote:

Thanks for comming back to this, @nipunn1313 https://github.com/nipunn1313.

Iiuc, you’re hoping for a way for the implementation detail (registry index) of cargo to work without using git.

Not quite, using git's mechanism is fine no need to replace that with something selfmade. If I understand it correctly so far, cargo use git to manage a data store ("the index") and sync it with a server. If that's true, what does cargo need the user's config file for?

I quickly skimmed through the options in man git-config: Long list of options advice. -> I don't think cargo cares core. -> maybe? add. -> nope alias. -> nope am. -> I don't think cargo uses that apply. -> dito blame. -> dito branch. -> dito browser. -> dito checkout. -> I don't think cargo cares clean. -> dito clone. -> dito color. -> nope column. -> nope commit. -> I don't think cargo cares commitGraph. -> dito credential. -> I don't think cargo uses that completion. -> nope diff. -> nope extensions. -> I don't think cargo cares fastimport. -> dito feature. -> dito fetch. -> maybe? format. -> dito fsck. -> dito gc. -> I don't think cargo cares gitcvs. -> I don't think cargo uses that gitweb. -> dito grep. -> dito gpg. -> dito gui. -> dito guitool. -> dito help. -> dito http. -> maybe some of it but I guess cargo has its own proxy settings i18n. -> I don't think cargo uses that imap. -> dito index. -> I don't think cargo cares init. -> I don't think cargo cares instaweb. -> I don't think cargo uses that interactive. -> dito log. -> dito lsrefs. -> dito mailinfo. -> dito mailmap. -> dito maintenance. -> I don't think cargo cares man. -> I don't think cargo uses that merge. -> I don't think cargo uses that mergetool. -> dito notes. -> dito pack. -> I don't think cargo cares pager. -> I don't think cargo uses that pretty. -> dito protocol. -> dito pull. -> dito push. -> dito rebase. -> dito receive. -> I don't think cargo cares remote. -> dito repack. -> dito rerere. -> I don't think cargo uses that reset. -> I don't think cargo cares sendmail. -> I don't think cargo uses that sequence. -> dito showBranch. -> dito splitIndex. -> dito ssh. -> dito status. -> dito stash. -> dito submodule. -> dito tag. -> dito tar. -> dito trace2. -> dito transfer. -> I don't think cargo cares uploadArchive. -> dito uploadPack. -> dito url. -> I don't think cargo uses that user., author., committer. -> dito versionsort. -> dito web -> dito worktree.* -> dito

So again, I'm wondering what config variable does cargo need? (In fact, I even believe that some can impede cargo and should thus be ignored, e.g. url. On the other hand, I don't know which config variables libgit2 supports/understands, so maybe most of these are ignored anyway.)

I suppose I haven’t checked if there’s a way to invoke libgit2 while asking it not to parse the config. That could be a fallback solution if it exists. There might be other unanticipated consequences to such a strategy.

Like more deterministic behaviour as the user cannot tamper with an internal implementation. ;-)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/cargo/issues/9854#issuecomment-913015938, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ5PI3LEG4AGZD6HEHL4I3UAJOFRANCNFSM5DABUXTA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

portiatrivisonno commented 2 years ago

There is a potential misunderstanding here.

Cargo invokes git (via libgit2). Git automatically parses the config. Cargo doesn’t directly need anything from the config, but it is not clear if it possible to bypass the config parsing.

It's true that cargo doesn't parse the config file but I believe Cargo relies on Git/libgit2 to work as it expects it to, i.e. not being influenced by the git config.

If you can find a way in libgit2, let us know.

Sure. Git has the environment variables GIT_CONFIG_GLOBAL and GIT_CONFIG_SYSTEM, if set to /dev/null, the respective config files are skipped. libgit2 doesn't seem to honor these. However, you can configure the lookup locations of libgit2 directly:

git_libgit2_init();
git_libgit2_opts(GIT_OPT_SET_SEARCH_PATH, GIT_CONFIG_LEVEL_SYSTEM, "");
git_libgit2_opts(GIT_OPT_SET_SEARCH_PATH, GIT_CONFIG_LEVEL_GLOBAL, "");
git_libgit2_opts(GIT_OPT_SET_SEARCH_PATH, GIT_CONFIG_LEVEL_XDG, "");
git_libgit2_opts(GIT_OPT_SET_SEARCH_PATH, GIT_CONFIG_LEVEL_PROGRAMDATA, "");
//git_repository_init(...);

(Using NULL instead of "" resets to the default locations, which is not what we want.)

Git automatically respects the user’s config, and that’s part of the contract of git.

And I argued it shouldn't. :) Maybe an analogy helps? Libgit2 becomes the shell and Cargo uses it to execute a shell-script. Here cargo expects the shell interpreter to act consistently regardless of the user's config, e.g. ? globbing should work and not error our, aliases should not be expanded, etc. The analogy doesn't quite work out in real live because by default shells do not load the user's config for non-interactive shells (which is the sane default I'm arguing for) because that would break a lot of scripts [an exception would be zsh widgets but the first thing they do is issue an emulate zsh to reset to the defaults, i.e. be independent of the user's config].

The same is true for Cargo and libgit2: If Cargo fetches from github.com, it expects to be contacting github.com and not some-other-domain.com which could happen if the user rewrites that with the url gitconfig variable (which seems to be honored by libgit2).

nipunn1313 commented 2 years ago

Sounds like you're arguing that git's defaults of respecting the user's git configuration don't make sense for cargo's invocation of git. That makes sense to me. Cargo currently uses git's default behavior w.r.t. configuration.

If we can convince libgit2 via flags to our invocation, it seems like a plausible option. It does seem possible to get libgit2 to respect env vars https://docs.rs/git2/0.8.0/git2/struct.Repository.html#method.open_from_env

I am not able to repro the env vars you describe

➜  git git:(master) git status
fatal: bad config line 1 in file /Users/nipunn/.config/git/config
➜  git GIT_CONFIG_GLOBAL=/dev/null git status
fatal: bad config line 1 in file /Users/nipunn/.config/git/config
➜  git GIT_CONFIG=/dev/null git status
fatal: bad config line 1 in file /Users/nipunn/.config/git/config

However, after digging a bit, I think the env vars you found were introduced extremely recently in April 2021 https://github.com/git/git/commit/4179b4897f2de28858acaebd6382c06c91532e98 (see announcement) https://lore.kernel.org/lkml/xmqqa6o3xj2e.fsf@gitster.g/T/ in git 2.32.0

Was able to repro and confirm it would work by building git from source

➜  git bin-wrappers/git --version
fatal: bad config line 1 in file /Users/nipunn/.config/git/config
➜  git GIT_CONFIG_GLOBAL="" bin-wrappers/git --version
git version 2.33.0.142.ge0a2f5cbc5

Seems like we could use it here to avoid this issue - but it seems unlikely our customers would see benefit for a number of years until they upgrade to super new git. But agreed - seems to make sense to use it.

Can we reopen this issue @alexcrichton ? Action item: Open with GIT_CONFIG_GLOBAL flag set to empty string here - https://github.com/nipunn1313/cargo/blob/168fac4fa3920abaa2d48f8577c727ecad586ebe/src/cargo/sources/registry/remote.rs#L68, w/ comment that it requires version of git 2.32.0 or newer.

[EDIT] - realized I was researching git, and not libgit2. Following up w/ research into libgit2.

nipunn1313 commented 2 years ago

libgit2 doesn't yet seem to support GIT_CONFIG_GLOBAL

It does have

git_libgit2_opts(GIT_OPT_SET_SEARCH_PATH, GIT_CONFIG_LEVEL_GLOBAL, "");

which can be invoked via https://docs.rs/git2/0.13.21/git2/opts/fn.set_search_path.html

However, it does warn that it cannot guarantee thread safety and must be externally synchronized. I believe we'd have to set the flags once at the beginning of cargo before threads are spawned. Given that we use libgit2 for both the registry AND the user's project, this seems tough to pull off (as I'd assume they'd want different settings). I see that the testsuite does this https://github.com/rust-lang/cargo/blob/e51522ab3db23b0d8f1de54eb1f0113924896331/crates/cargo-test-support/src/git.rs#L141 - would it make sense to do something like this within cargo itself?

I think AI here would be