rstudio / rsconnect

Publish Shiny Applications, RMarkdown Documents, Jupyter Notebooks, Plumber APIs, and more
http://rstudio.github.io/rsconnect/
131 stars 80 forks source link

writeManifest fails when using Artifactory CRAN mirror #654

Closed devinrkeane closed 1 year ago

devinrkeane commented 1 year ago

I am helping set up Connect in our org and we are migrating internal packages to Artifactory. For context, when using renv I can easily install private packages through the renv.download.headers option without including credentials. Creating the renv.lock file is no trouble.

For example, our repos include RSPM and COMPANY (our cran mirror that includes internal packages), and renv successfully matches this, for example the lock file looks like:

    "MASS": {
      "Package": "MASS",
      "Version": "7.3-58.1",
      "Source": "Repository",
      "Repository": "RSPM",
      "Hash": "762e1804143a332333c054759f89a706",
      "Requirements": []
    },
    "privatePackage": {
      "Package": "privatePackage",
      "Version": "0.3.2",
      "Source": "Repository",
      "Repository": "COMPANY",
      "Hash": "65548e5fc6afeb0e24004313ad873419",
      "Requirements": [
        "data.table"
      ]
    }

With the right config you can successfully restore the project.

However when attempting to recreate this with writeManifest, the process fails and returns something like this:

 "MASS": {
      "Source": "RSPM",
      "Repository": "https://packagemanager.rstudio.com/all/__linux__/focal/latest",
      "description": {
        "Package": "MASS",
        "Priority": "recommended",
        "Version": "7.3-58.1",
        "Date": "2022-07-27",
        "Revision": "$Rev: 3606 $",
        "Depends": "R (>= 3.3.0), grDevices, graphics, stats, utils",
        "Imports": "methods",
        "Suggests": "lattice, nlme, nnet, survival",
        "Authors@R": "c(person(\"Brian\", \"Ripley\", role = c(\"aut\", \"cre\", \"cph\"),\n                    email = \"ripley@stats.ox.ac.uk\"),\n\t     person(\"Bill\", \"Venables\", role = \"ctb\"),\n\t     person(c(\"Douglas\", \"M.\"), \"Bates\", role = \"ctb\"),\n\t     person(\"Kurt\", \"Hornik\", role = \"trl\",\n                     comment = \"partial port ca 1998\"),\n\t     person(\"Albrecht\", \"Gebhardt\", role = \"trl\",\n                     comment = \"partial port ca 1998\"),\n\t     person(\"David\", \"Firth\", role = \"ctb\"))",
        "Description": "Functions and datasets to support Venables and Ripley,\n  \"Modern Applied Statistics with S\" (4th edition, 2002).",
        "Title": "Support Functions and Datasets for Venables and Ripley's MASS",
        "LazyData": "yes",
        "ByteCompile": "yes",
        "License": "GPL-2 | GPL-3",
        "URL": "http://www.stats.ox.ac.uk/pub/MASS4/",
        "Contact": "<MASS@stats.ox.ac.uk>",
        "NeedsCompilation": "yes",
        "Packaged": "2022-07-27 05:37:13 UTC; ripley",
        "Author": "Brian Ripley [aut, cre, cph],\n  Bill Venables [ctb],\n  Douglas M. Bates [ctb],\n  Kurt Hornik [trl] (partial port ca 1998),\n  Albrecht Gebhardt [trl] (partial port ca 1998),\n  David Firth [ctb]",
        "Maintainer": "Brian Ripley <ripley@stats.ox.ac.uk>",
        "Repository": "RSPM",
        "Date/Publication": "2022-08-03 15:06:59 UTC",
        "Encoding": "UTF-8",
        "Built": "R 4.1.0; x86_64-pc-linux-gnu; 2022-08-04 10:50:08 UTC; unix"
      }
    },
    "privatePackage": {
      "Source": "CRAN",
      "Repository": null,
      "description": {
        "Package": "privatePackage",
        "Title": "my title",
        "Version": "0.3.2",
        "Authors@R": "\n    person(given = \"me\",\n           family = \"me\",\n           role = c(\"aut\", \"cre\"),\n           email = \"me@me.com\",\n           comment = c(ORCID = \"YOUR-ORCID-ID\"))",
        "Description": "Just a plot and a csv.",
        "License": "What license it uses",
        "Encoding": "UTF-8",
        "LazyData": "true",
        "RoxygenNote": "7.2.0",
        "Depends": "R (>= 3.5)",
        "Imports": "data.table",
        "Suggests": "covr, httptest, lintr, styler, testthat (>= 2.1.0), vcr",
        "NeedsCompilation": "no",
        "Packaged": "2022-12-29 23:05:49 UTC; root",
        "Author": "me[aut, cre] (YOUR-ORCID-ID)",
        "Maintainer": "me <me@me.com>",
        "Built": "R 4.1.3; ; 2023-01-09 22:11:31 UTC; unix"
      }
    }

the key difference here being "Source": "CRAN", not "COMPANY", and "Repository": null instead of "https://jfrog.mycompany.io.artifactory/cran/"

I traced this down to the fact that available.packages tries to check the package repo index and fails because the private repo requires authentication.

I fixed this by setting `options(download.file.method="curl") in conjunction with adding credentials to a .curlrc file.

Running with this config I'm able to authenticate to our private repo, but get this warning:

Warning: unable to access index for repository https://jfrog.mycompany.io/artifactory/cran/src/contrib:
  Line starting '{ ...' is malformed!

debugging available.packages I see it does not correctly download the PACKAGES index file. It thinks it got a successful download or successful file when it really does not during the PACKAGES.rds and PACKAGES.gz attempts. It proceeds to skip the step we need. Our Artifactory apparently seems to be storing only the PACKAGES file.

For example, in available.packages the line:

          z <- tryCatch({
            download.file(url = paste0(repos, "/PACKAGES.gz"), 
              destfile = tmpf, method = method, cacheOK = FALSE, 
              quiet = quiet, mode = "wb", ...)
          }, error = identity)

returns 0 and does not "error", even though the response was not really successful - it does download a tempfile, as it expect to, but when opened just gives you the http error in json:

{
  "errors" : [ {
    "status" : 404,
    "message" : "Could not find resource"
  } ]
}

So this file exists, and thus is "successful", so it ignores the next download.file attempt for the file we really need, and the function exits

I'm not sure if this is an rsconnect issue directly. I'm not sure if it's something we can change in Artifactory but the fact that the PACKAGES file is one of the 3 possible files to try would suggest that Artifactory is ok there. At this point we have to manually change the manifest.json file to work with deploying applications, which is not ideal obviously.

Any help here is appreciated.

SamEdwardes commented 1 year ago

Note this issue is also related to: https://github.com/rstudio/packrat/issues/702.

sagerb commented 1 year ago

Great job documenting what you have investigated.

I have a few observations and one approach you can try below. I do not currently have the infrastructure available to validate the approach but with the goal of getting you something to try quickly, I wanted to make this approach available for you.

As far as the error file being created when the request for a non-existant file is requested, you are seeing curl's default behavior. This can be changed by specifying the -f/--fail option on its command line.

(from https://curl.se/docs/manpage.html#-fl)

-f, --fail

(HTTP) Fail fast with no output at all on server errors. This is useful to enable scripts and users to better deal with failed attempts. In normal cases when an HTTP server fails to deliver a document, it returns an HTML document stating so (which often also describes why and more). This flag will prevent curl from outputting that and return error 22.

This method is not fail-safe and there are occasions where non-successful response codes will slip through, especially when authentication is involved (response codes 401 and 407).

For reference, Connect sets the command line options for its curl executions to -L -g -f -s --connect-timeout <value> --stderr -. Additional options can be set within the .curlrc file, located by default within the home directory of the connect user's account (which defaults to rstudio connect). You can override the location w/ the CURL_HOME environment variable on the server.

Of interest to the approach, curl supports the use of .netrc files (when the curl option of --netrc is specified) which can hold the credentials for different hosts.

Using this information above, my suggested approach to try is to first experiment with updating your test environment's .curlrc file to include the --fail option. Then retry your experiment and see if placing the credentials within the .curlrc file get you the expected behavior.

If that works, then you can try removing those credentials from the .curlrc file but also adding the --netrc option into it. Then place your credentials in a .netrc file associated with the host of your Artifactory repo.

Once you get that working in your test environment then you should be able to place those configuration files onto the Connect server within the ~/rstudio-connect subdirectory. Then try your a simple deployment and hopefully, the curl commands being executed will authenticate with your Artifactory repo.

Using this approach, credentials will be able to be hidden on the Connect server and you will not need to add them within your manifest files.

Again, I haven't been able to verify this here but I look forward to hearing your findings.

devinrkeane commented 1 year ago

This approach seems to be working for us so far, thanks!

Added my uname:pw and --fail to the .curlrc file and now just get a couple of annoying warnings every session locally (code 22s now instead with the new config) but that's it.

Still I think this is a bug in available.packages, unless it should be guaranteed that all the package index files are available. And if that's the case it should warn about this at the least.

Also, per our engineer, for the RSC server we ended up doing the .curlrc with the .netrc configs together. In the .netrc we can configure which specific domains to inject the auth information.

sagerb commented 1 year ago

That's great to hear!

Yes, I agree, proper detection of the failure for the curl command queries is currently dependent on an error being returned from the curl command rather than from the response data. This could be improved or at least documented.

devinrkeane commented 1 year ago

@sagerb Curious if you had any thoughts on a another wrinkle related to this we're running into - seems Artifactory doesnt follow the cran package archive path convention - meaning we cant install earlier versions of packages only we store (private), even though Artifactory stores them in the archive. This seems to be an outstanding issue still with Artifactory. From what I can tell renv does account for this but packrat and/or rsconnect does not.

Though we noticed that if we specify the url directly, i.e. in the example below, install.packages("https://jfrog.company.io/cran/src/contrib/Archive/privatePackage/0.3.1/privatePackage_0.3.1.tar.gz") we can install it locally - then and rsconnect stores this url in it's manifest.json file, and adds the url and remoteUrl fields like so:

    "privatePackage": {
      "Source": "COMPANY",
      "Repository": "https://jfrog.company.io/artifactory/cran",
      "description": {
        "Package": "privatePackage",
        "Title": "my title",
        "Version": "0.3.1",
        "Authors@R": "\n    person(given = \"me\",\n           family = \"me\",\n           role = c(\"aut\", \"cre\"),\n           email = \"me@me.com\",\n           comment = c(ORCID = \"YOUR-ORCID-ID\"))",
        "Description": "Just a plot and a csv.",
        "License": "What license it uses",
        "Encoding": "UTF-8",
        "LazyData": "true",
        "RoxygenNote": "7.2.0",
        "Depends": "R (>= 3.5)",
        "Imports": "data.table",
        "Suggests": "covr, httptest, lintr, styler, testthat (>= 2.1.0), vcr",
        "NeedsCompilation": "no",
        "Packaged": "2022-12-29 23:05:49 UTC; root",
        "Author": "me[aut, cre] (YOUR-ORCID-ID)",
        "Maintainer": "me <me@me.com>",
        "Built": "R 4.1.3; ; 2023-01-09 22:11:31 UTC; unix",
        "RemoteType": "url",
        "RemoteUrl": "https://jfrog.company.io/cran/src/contrib/Archive/privatePackage/0.3.1/privatePackage_0.3.1.tar.gz"
      }
    }

But the connect publish process fails because it's trying Artifactory url in Repository and running into the wrong archive path pattern. Curious if that remoteUrl field could, or should be utilized by packrat, or is it?

hadley commented 1 year ago

Having read through this thread, I think we've resolved as much as we can on the rsconnect side. It sounds like the remaining problems are due to either bugs in base R (i.e. available.packages() warns rather than erroring if the repository index isn't available) or because artifactory doesn't use a fully CRAN compatible URL structure.