r-lib / gert

Simple git client for R
https://docs.ropensci.org/gert/
Other
146 stars 31 forks source link

ask gert about proxy? #122

Open ijlyttle opened 3 years ago

ijlyttle commented 3 years ago

I have a werid situation using a Windows machine in a corporate setting. I have installed gert 1.1.0; gert::libgit2_config() tells me that I am using version 1.1.0.

I have two remotes: one on an internal GitHub Enterprise, the other at github.com.

On the internal remote, I am able to git pull and gert::git_pull().

On the github.com remote, I am able to git pull but I cannot gert::git_pull() - I get an error:

Error in libgit2::git_remote_fetch : 
  failed to send request: The operation timed out

When something like this happens, I think immediately about the proxy. I have the usual environment variables, http_proxy, etc. set.

In other contexts on this machine, the proxy seems to work - I can git pull for both repos. As well, I can httr::GET() both internally and externally. Hence my question:

Is there a way to ask gert (or libgit2) what it thinks its proxy settings are, i.e. is there a verbose mode?

jeroen commented 3 years ago

I think libgit2 no longer uses libcurl, so those variables are not going to work.

We probably need to specify the proxy manually. Does this tell us the proxy?

git config --global http.proxy

Does the proxy itself require authentication?

ijlyttle commented 3 years ago
git config --global http.proxy

returns blank for me - when I set it (and https.proxy) manually to the same values as the environment variables, I get the same behavior as before: git pull works; gert::git_pull() does not for the external repo.

Thankfully, the proxy does not require authentication.

jeroen commented 3 years ago

I don't understand where your command-line git gets the proxy settings. According to the git manual there are no environment variables that git uses, so you must have them configured somewhere?

jeroen commented 3 years ago

Can you try installing from the proxy branch and test if this solves your problem? Run this in a clean, empty R session:

remotes::install_github("r-lib/gert@proxy")

See also: https://github.com/r-lib/gert/pull/123

ijlyttle commented 3 years ago

Good news and bad news:

I'm sure you suspect the same thing: my guess is that the no_proxy variable is not respected.

jeroen commented 3 years ago

What is the no_proxy variable? Is this documented anywhere?

jeroen commented 3 years ago

What you probably will have to do is configure the proxy per-remote in your git remote settings. This will take precedence over any global settings or environment variables.

The libgit2 logic is here: https://github.com/libgit2/libgit2/blob/81c98af777329817827609a90462d8b2fd4a845b/src/remote.c#L822

So if your remote is named internal I think you need to set git config remote.internal.proxy and that way the proxy will only be used for your remote named "internal".

ijlyttle commented 3 years ago

Here's what curl has on no_proxy: https://curl.se/libcurl/c/CURLOPT_NOPROXY.html

I'll have a look at the libgit2 logic.

ijlyttle commented 3 years ago

Right now I'm wrestling with a way to set this using the global git config for all URLs that match the internal GHE, I'll keep on that path.

I'm seeing that this phenomenon (not picking up proxy environment variables) seems to be a Windows thing - I agrees with my not having had this issue on my Mac.

That said, I tried remotes::install_github("r-lib/gert@proxy") on my Mac (clean session), then gert::git_pull() and I get:

Error in libgit2::git_remote_fetch : invalid url, missing path
Backtrace:
    █
 1. ├─[ gert::git_pull() ] with 1 more call
 3. └─gert:::raise_libgit2_error(...)

for both internal and external repos.

To be clear, I did not change any git config on my Mac to add the proxy variables explicitly.

jeroen commented 3 years ago

Hmmm so on mac do you also use the environment variable? Or the git config http.proxy method?

Did it work before on mac, on the same network, without doing anything? I'm surprised by that because I think proxy's weren't enabled at all. so if it works, it must be some other thing in the system that automatically routes the connection.

ijlyttle commented 3 years ago

On Mac, I use the environment variable method - no git config for proxy - it works. I just reinstalled CRAN gert on Mac to confirm.

jeroen commented 3 years ago

OK that's interesting. Does it also work without setting that variable? Perhaps you have the proxy configured in macos preferences such that all traffic gets proxy'd already by the OS.

And you were saying that it stops working with the proxy branch, where we explicitly try to set the proxy in libgit2?

ijlyttle commented 3 years ago

On my Mac, the proxy uses the "Automatic Proxy Configuration" setting - there is a corporate-IT management. When I unset all my proxy environment variables, everything still works (as an aside, this is a new behavior - I used to have to set the variables - but that's a different topic :) ).

To confirm, on my Mac it does stop working using the proxy branch; but it is working with the CRAN version.

FWIW, I'll try some git config settings on the Windows machine using the CRAN version. If that works, that's a good result for me and I would be happy to contribute some documentation.

Of course, I'll be happy to try anything you suggest from your end.

jeroen commented 3 years ago

Maybe I should only auto-enable the proxy on windows then, if apparently macos takes care of things in the system.

ijlyttle commented 3 years ago

The Windows path (use CRAN version, manage git config settings) was not successful. Then I came across this, which I found illuminating: https://github.com/libgit2/libgit2/issues/4164

ijlyttle commented 3 years ago

Auto-enabling for windows might be a way forward - I say this as a sample of one Mac :)

I found a workaround on Windows using the proxy branch. This works for my internal host:

withr::with_envvar(
  list(http_proxy = "", https_proxy=""), 
  gert::git_pull()
)

I realize this could be a very bad idea, but could gert be made to recognize the NO_PROXY environment variable, then unset the proxy variables if the remote host matches a NO_PROXY host?

ijlyttle commented 3 years ago

I have found out a little more, thanks to this issue-comment: https://github.com/libgit2/libgit2/issues/5255#issuecomment-540741132

On my Mac, I set my http_proxy, etc. envvars appending a trailing slash, and installed the proxy branch. It worked 🎉

This gives me an idea for a function to manage these envvars before they get sent to libgit2, managing the Windows case as well. I can make a PR to the proxy branch.

Fair warning, you may not want to look at my forthcoming PR while eating :)

jeroen commented 3 years ago

I realize this could be a very bad idea, but could gert be made to recognize the NO_PROXY environment variable, then unset the proxy variables if the remote host matches a NO_PROXY host?

This is libcurl behavior, it happens to work when libgit2 uses libcurl for https, we can't rely on that. Most libgit2 installations do not use libcurl. The recommended approach is really to set the http.proxy option in the git config of your remote.

From your observations it sounds like the proxy should be configured either in your OS network settings, or in libgit2, but not both.

Perhaps you can even make everything work on windows with the CRAN version if you configure the proxy in your Windows settings like described here: https://stackoverflow.com/a/57613619/318752

ijlyttle commented 3 years ago

I tried to configure the proxy using Windows registry - I do not have permission (corporate policy).

Setting the proxy on a per-remote basis: I don't think that would work with cloning, where the remote does not yet exist. If I'm thinking correctly, that would leave me unable to use usethis::create_from_github().

I like the idea of setting the proxy in a global .gitconfig, but I can't get that to work using libgit2. Although the issue (https://github.com/libgit2/libgit2/issues/4164) is a few years old, it is noted as a case covered by git but not by libgit2.

This leaves me stuck, but I think I can bring some other things into better focus, if you think it could help.

What I was thinking was a Mac/Windows difference - I think this is properly a libgit2 version difference: my Mac uses libgit2 0.28.1, on Windows 1.1.0.

The behavior I am seeing between the CRAN version and the proxy-branch version, I think, can be explained if my Mac libgit2 was built against libcurl. There is nothing new here, it's just a better-organized version of what I had done before:

platform libgit2 version gert internal access external access comment
Mac 0.28.1 CRAN yes yes relies on libcurl(?), proxy behaves
Windows 1.1.0 CRAN yes no proxy never used
Mac 0.28.1 proxy-branch no yes (1) proxy always used
Windows 1.1.0 proxy-branch no yes proxy always used

(1) I had to add a trailing backslash to the proxy address, which is not conventional. This is fixed in version 0.99: https://github.com/libgit2/libgit2/issues/5255#issuecomment-541528591

This leads to another question, for which I would be happy to file a different issue: would it be useful to alert the user if their libgit2 version is too old?

My plan, for now, is to test some git configurations.

Thanks for your patience with me on this.

jeroen commented 3 years ago

Thanks for the detailed information. First, how did you get libgit 0.28.1 on mac? If you use homebrew, then brew update should get you to the 1.1.0 version. Maybe that would be one factor we can eliminate.

ijlyttle commented 3 years ago

I don't remember ever installing anything on the Mac - my guess is that it's the version that came with it (early 2019).

I agree that a factor is eliminated by upgrading my Mac to libgit 1.1.0, but I would want to make sure that I can downgrade again as this situation is the only one today where everything is working - and I still need to access both inside and outside.

jeroen commented 3 years ago

MacOS does not include any libgit2, you must have installed it yourself at some point. Both Homebrew and the gert CRAN binary packages include libgit2 1.1.0 so it would be good if we can test with this version.

ijlyttle commented 3 years ago

Quick update from my end: I have lost access to the Windows machine to which I have remote access. I should have access again by Thursday, when a colleague goes into the office (four states away).

I agree that I need to upgrade my Mac's libgit2 to 1.1.0 - I just need to make sure that I can maintain access both inside and outside. I have been working on some functions that (I hope) will help me use my Mac once I upgrade and use the proxy branch - all being well, I'll update this evening.

ijlyttle commented 3 years ago

libgit2 1.1.0 installed now installed on Mac - repeating the experiment. I have a without_proxy() function that I use to unset the proxy environment-variables.

platform libgit2 version gert call remote type success
Mac 1.1.0 CRAN git_pull() internal yes
Mac 1.1.0 CRAN without_proxy(git_pull()) internal yes
Mac 1.1.0 CRAN git_pull() external yes
Mac 1.1.0 CRAN without_proxy(git_pull()) external yes (1)
Mac 1.1.0 proxy-branch git_pull() internal no (2)
Mac 1.1.0 proxy_branch without_proxy(git_pull()) internal yes
Mac 1.1.0 proxy-branch git_pull() external yes
Mac 1.1.0 proxy_branch without_proxy(git_pull()) external yes (1)

(1) It surprised me that it worked here. I think there is something about the process I don't understand.

(2) Error:

Error in libgit2::git_remote_fetch : 
  proxy returned unexpected status: 502
Backtrace:
    █
 1. ├─[ gert::git_pull() ] with 1 more call
 3. └─gert:::raise_libgit2_error(...)

This seems an encouraging result, even if it is puzzling to me. I don't have anything set up in git configuration for the proxy variables.

I'll add to the experiment with Windows when I will have access.

jeroen commented 3 years ago

Also relevant: https://github.com/libgit2/libgit2/pull/5774

ijlyttle commented 3 years ago

I now have access again to my Windows machine.

platform libgit2 version gert call remote type success
Windows 1.1.0 CRAN git_pull() internal yes
Windows 1.1.0 CRAN without_proxy(git_pull()) internal yes
Windows 1.1.0 CRAN git_pull() external no (1)
Windows 1.1.0 CRAN without_proxy(git_pull()) external no (1, 2)
Windows 1.1.0 proxy-branch git_pull() internal no (3)
Windows 1.1.0 proxy_branch without_proxy(git_pull()) internal yes
Windows 1.1.0 proxy-branch git_pull() external yes
Windows 1.1.0 proxy_branch without_proxy(git_pull()) external no (1, 2)

(1) Error:

Error in libgit2::git_remote_fetch : 
  failed to send request: The operation timed out

(2) In my opinion, it should fail here.

(3) Error:

Error in libgit2::git_remote_fetch : request failed with status code: 502

For the sake of completeness:

list_proxy <- function(value) {
  list(
    http_proxy = value,
    HTTP_PROXY = value,
    https_proxy = value,
    HTTPS_PROXY = value
  )
}

without_proxy <- function(code) {
  withr::with_envvar(
    new = list_proxy(NA_character_),
    code = code
  )
}

There is more going on here and with libgit2 (issue 5774) than I understand or can offer an opinion on. Here's what I feel confident on:

jeroen commented 3 years ago

Also some progress in libgit2 in supporting NO_PROXY: https://github.com/libgit2/libgit2/pull/5796

jeroen commented 3 years ago

Looks like NO_PROXY has landed in libgit2 1.2.0: https://github.com/libgit2/libgit2/pull/6026 I'll update this soon so we can test it.