jupyterhub / mybinder.org-user-guide

Turn a Git repo into a collection of interactive notebooks. This is Binder's user documentation repository.
https://mybinder.readthedocs.io
BSD 3-Clause "New" or "Revised" License
151 stars 103 forks source link

MyBinder incorrectly parse links from URL mybinder.org/v2/git/ #191

Closed jjur closed 4 years ago

jjur commented 4 years ago

Hello! I have several repositories running on my own GitLab server on Azure. All MyBinder links recently stopped working. For example the link (to reproduce, try random repo from GitLab.com or your own git: https://mybinder.org/v2/git/https%3A%2F%2Fgitlab.com%2FG5Fr3%2Fanalog-clock/master It immediately starts with error:

Error resolving ref for git:https:/gitlab.com/G5Fr3/analog-clock/master: Unable to run git ls-remote to get the resolved_ref: ssh:Could not resolve hostname https: Name or service not known fatal: Could not read from remote repository.

Please make sure you have the correct access rights and the repository exists. image

When I add an extra %2F, it works perfectly and launches Binder. https://mybinder.org/v2/git/https%3A%2F%2F%2Fgitlab.com%2FG5Fr3%2Fanalog-clock/master

All my links used to work in past, but today all of them throw the same error.

betatim commented 4 years ago

Thanks for reporting this. Weird that this has stopped working suddenly as I don't think we've updated mybinder.org in the last few days. Will investigate.

betatim commented 4 years ago

https://gke.mybinder.org/v2/git/https%3A%2F%2Fgitlab.com%2FG5Fr3%2Fanalog-clock/master works for me. This is on one specific cluster in our federation and with two slashes.

The generic link https://mybinder.org/v2/git/https%3A%2F%2Fgitlab.com%2FG5Fr3%2Fanalog-clock/master currently also goes to GKE and works. However ~20min ago it didn't work but I didn't note down which cluster it was being sent to.

betatim commented 4 years ago

explicitly tested turing, gesis and OVH. They work as well. Weird. The last thing I can think of is that we mangle the URL when we redirect from the generic mybinder.org to a specific cluster, but only for one of the clusters.

Just managed to reproduce this when getting sent to Gesis (@bitnik).

Request: GET /build/git/https%3A%2F%2Fgitlab.com%2FG5Fr3%2Fanalog-clock/master HTTP/1.1 Host: mybinder.org User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:74.0) Gecko/20100101 Firefox/74.0 Accept: text/event-stream Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, br Referer: https://mybinder.org/v2/git/https%3A%2F%2Fgitlab.com%2FG5Fr3%2Fanalog-clock/master DNT: 1 Connection: keep-alive Pragma: no-cache Cache-Control: no-cache

Response: HTTP/2 307 Temporary Redirect server: nginx/1.13.12 date: Thu, 26 Mar 2020 10:57:12 GMT content-type: text/html; charset=UTF-8 content-length: 0 location: https://gesis.mybinder.org/build/git/https%3A%2F%2Fgitlab.com%2FG5Fr3%2Fanalog-clock/master access-control-allow-origin: * access-control-allow-headers: cache-control set-cookie: host="https://gesis.mybinder.org"; Path=/build/git/https%3A%2F%2Fgitlab.com%2FG5Fr3%2Fanalog-clock/master strict-transport-security: max-age=15724800; includeSubDomains X-Firefox-Spdy: h2

Request: GET /build/git/https%3A%2F%2Fgitlab.com%2FG5Fr3%2Fanalog-clock/master HTTP/1.1 Host: gesis.mybinder.org User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:74.0) Gecko/20100101 Firefox/74.0 Accept: text/event-stream Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, br Origin: https://mybinder.org Referer: https://mybinder.org/v2/git/https%3A%2F%2Fgitlab.com%2FG5Fr3%2Fanalog-clock/master DNT: 1 Connection: keep-alive Pragma: no-cache Cache-Control: no-cache

Response: HTTP/2 200 OK server: nginx/1.14.0 (Ubuntu) date: Thu, 26 Mar 2020 10:57:13 GMT content-type: text/event-stream access-control-allow-origin: * access-control-allow-headers: cache-control cache-control: no-cache strict-transport-security: max-age=63072000 X-Firefox-Spdy: h2

With this being shown in the browser window:

Screenshot 2020-03-26 at 12 14 27

Do you think it could be something in the request rewriting in nginx that swallows a /?

bitnik commented 4 years ago

@betatim thanks a lot for all the debug information, it made things much easier to find out the problem. I think it is solved now, https://gesis.mybinder.org/build/git/https%3A%2F%2Fgitlab.com%2FG5Fr3%2Fanalog-clock/master works for me.

The problem was that in our nginx configuration proxy_pass is specified with URI:

    location /build/ {
        proxy_pass http://gesisbinder/binder/build/;

and proxy_pass normalizes (decodes) the URI (see documentation) and this was the issue, because launch urls have encoded parts for git or gitlab repo providers.

Using $request_uri in proxy_pass solved the issue

    location /build/ {
        proxy_pass http://gesisbinder/binder$request_uri;
betatim commented 4 years ago

@jjur let us know if this works for you as well now. If so please do close the issue :)

jjur commented 4 years ago

I have checked the links and all seems to work again. Thank you fast response and fixing the bug.

JColl88 commented 2 years ago

Thought it worth adding an extra comment since we encountered this issue with our own BinderHub prototype deployment, which could be a common use case.

The error message read:

Error resolving ref for git:https://gitlab.in2p3.fr/escape2020/wp3/eossr/HEAD: Unable to run git ls-remote to get the resolved_ref: ssh: Could not resolve hostname https: Name or service not known

It looks like because NGINX had modified the URL when redirecting, it wasn't recognised that the protocol was https.

We had to change the location specified in the NGINX config from

    location /binderhub/ {
        proxy_pass http://<ip>:<port>/binderhub/;

to

    location /binderhub/ {
        proxy_pass http://<ip>:<port>$request_uri;