r-lib / gh

Minimalistic GitHub API client in R
https://gh.r-lib.org
Other
223 stars 52 forks source link

`gh()` does not work with download action artifact endpoint #190

Closed cderv closed 8 months ago

cderv commented 8 months ago

This is quite specific but found while debugging pandoc package

The issue that we face as users

> gh::gh("https://api.github.com/repos/jgm/pandoc/actions/artifacts/1274425566/zip", .destfile = "test.zip")
Error in `httr2::resp_body_json()`:
! Unexpected content type "application/xml".
• Expecting type "application/json" or suffix "json".
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error in `httr2::resp_body_json()`:
! Unexpected content type "application/xml".
• Expecting type "application/json" or suffix "json".
---
Backtrace:
    ▆
 1. └─gh::gh(...)
 2.   └─gh:::gh_make_request(req)
 3.     └─gh:::gh_error(resp, error_call = error_call)
 4.       └─httr2::resp_body_json(response)
Run rlang::last_trace(drop = FALSE) to see 4 hidden frames.

The download artifact endpoints follow this pattern

It seems httr2 will redirect directly to this Location URL, but it fails.

Testing with curl directly gave hints (the download url is only valid 1min so one need to the following by changing the url

Looking around about this I found an existing issue

The redirection URL does not expect any Authorization header, and currently the one for github API endpoint is passed through and makes the download fails.

I don't know if httr2 can avoid redirection with headers or can allow a non-redirect request so that a two-step flow could be done.

Could be something to fix in gh::gh() or not. Possibly, all redirect URL from GitHub should not have the initial Authorization header passed.

Or/and it could be httr2 issue if it does not allow to avoid the headers on redirection . urllib python package has a special function for this purpose for example (https://docs.python.org/3/library/urllib.request.html#urllib.request.Request.add_unredirected_header)

cderv commented 8 months ago

I was curious regarding curl -L working here, and I found this in the manual https://everything.curl.dev/http/redirects#redirecting-to-other-hostnames

Redirecting to other hostnames

When you use curl you may provide credentials like username and password for a particular site, but since an HTTP redirect might move away to a different host curl limits what it sends away to other hosts than the original within the same transfer.

So if you want the credentials to also get sent to the following hostnames even though they are not the same as the original—presumably because you trust them and know that there is no harm in doing that—you can tell curl that it is fine to do so by using the --location-trusted option.

So by default curl will not pass the Authorization header. it will if --location-trusted is added. And this reproduce our issue here:

> curl -L --location-trusted -I -H "Accept: application/vnd.github.v3+json" -H "User-Agent: https://github.com/r-lib/gh" -H "Authorization: token <redacted>" https://api.github.com/repos/jgm/pandoc/actions/artifacts/1274425566/zip -H 'Content-Type: application/json'
HTTP/1.1 302 Found
Server: GitHub.com
Date: Mon, 26 Feb 2024 13:51:52 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 0
Location: https://productionresultssa16.blob.core.windows.net/actions-results/a1733d37-335a-4eb0-969a-8921414446ad/workflow-job-run-2d2b3007-3c5c-5840-9bb0-2b1ea49925f3/artifacts/9f323f957753cd2d616c18b50616ec4af4dabd43d14cb7278f82c8d8fbd47fb7.zip?rscd=attachment%3B+filename%3D%22nightly-windows.zip%22&se=2024-02-26T14%3A01%3A52Z&sig=%2FWdbpyNpXrSHN%2B4N1Km8VtJVcjFnEb6%2Fg3TxVpPeQrs%3D&sp=r&spr=https&sr=b&st=2024-02-26T13%3A51%3A47Z&sv=2021-12-02
x-github-api-version-selected: 2022-11-28
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4997
X-RateLimit-Reset: 1708958890
X-RateLimit-Used: 3
X-RateLimit-Resource: core
Access-Control-Expose-Headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset
Access-Control-Allow-Origin: *
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
X-Frame-Options: deny
X-Content-Type-Options: nosniff
X-XSS-Protection: 0
Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin
Content-Security-Policy: default-src 'none'
Vary: Accept-Encoding, Accept, X-Requested-With
X-GitHub-Request-Id: 073A:0EB9:7CE7FE4:7E90609:65DC9778

HTTP/1.1 403 Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: b26371e7-301e-004e-7dba-687879000000
Access-Control-Expose-Headers: Content-Length,Date,Server,x-ms-request-id
Access-Control-Allow-Origin: *
Date: Mon, 26 Feb 2024 13:51:53 GMT

So maybe httr2 should be following the same pattern ?

If an issue should be open there, tell me.

cderv commented 8 months ago

This is down to

CURLOPT_UNRESTRICTED_AUTH was activated by default in curl R package, which lead to the auth header being passed through.

Using dev curl or setting req_options(unrestricted_auth = FALSE) with current httr2 version makes the request possible.

This issue with gh::gh() will solve itself once httr2 will import the new curl version.