Closed kevinushey closed 1 year ago
@arnauddeblic: do you know if there's a way for me to determine whether a repository URL is associated with a Nexus repository? E.g. is there some file or header I can query at the repository URL to determine that?
I've tried making some changes to support this in https://github.com/rstudio/renv/commit/a5822317dd0438c69752e24565bc0cdf7d88aa19; if you want to test you can try something like:
options(renv.nexus.enabled = TRUE)
renv::install(<package>)
and see if renv
is able to find a binary package at the Nexus "fallback" location.
If there's a way for me to query whether a repository is a Nexus repository, then I could eliminate the need to set an R option to opt-in to this behavior.
Dear @kevinushey,
Many thanks for addressing this issue so quickly!
To answer both your questions: 1) Concerning your implementation:
The fallback function is called and an URL is requested - this is a good start.
However, the requested URL is not correct. Based on the original example, with rlang@1.0.4
,
instead of requesting <repo>/bin/windows/contrib/4.1/rlang_1.0.4.zip
, the code requests <repo>/rlang_1.0.4.zip
:
Same behavior for source packages:
Instead of requesting <repo>/src/contrib/rlang_1.0.4.tar.gz
, the code requests <repo>/rlang_1.0.4.tar.gz
:
2) Concerning the way of querying whether a repository is a Nexus repository:
Since Nexus is a proxy (and cache) system, I'm afraid there is no special file that could help.
The HTTP header could be a solution (at least from what I can observe using my company's installation of Nexus).
When I request the repo using Postman, I get in the response a Server
header with value Nexus/3.25.1-04 (OSS)
:
In this configuration, looking whether the Server
header contains nexus
(with no case sensitivity) or not would provide you with the information.
Please note:
Since we are not sure Nexus will always send this header (companies sometimes change their name or their products name, you know it better than me ;) ), maybe you could secure this header request with your renv.nexus.enabled
option):
if header Server
contains nexus
or if option renv.nexus.enabled
is TRUE, then...
Other question
I have a last remark / question concerning some part of the code I have just seen in this last version of your retrieve.R
file.
In the CRAN version renv@0.15.5
, retrieving from source was always added to the methods list - unless pkgType
option was not source
. With such an algorithm, when pkgType
option was set to binary
, retrieving from source was tried, if no binary was found. I was quite confortable with such implementation.
In this new version, I understand that retrieving from source is no more added to the methods list if pkgType
option is set to binary
: srcok <- pkgtype %in% c("both", "source")
.
I'm not sure I understand the reason of this modification. From what I understand, pkgType
option is supposed to set the preferred installation method. Have you considered using option install.packages.check.source
? Maybe renv
should add "retrieve from source" to methods list, unless install.packages.check.source
is explicitely set to no
.
Would you have any question, please contact me. Kind regards Arnaud
Thanks! I've made the changes required (I think) to support the Nexus URLs properly. It might take a bit more iteration to refine but I think we're getting there.
Re: your question on srcok <- pkgtype %in% c("both", "source")
; in R, the pkgType
option defaults to "both"
:
> getOption("pkgType")
[1] "both"
and renv
tries to respect that choice. In this situation, R (and renv
) prefer installing binaries if available, but will fall back to source packages if not.
From what I can see in the R sources:
R uses the install.packages.check.source
option to allow a fallback to the source repository even if a binary repository was explicitly requested.
Many thanks @kevinushey for your support.
I tried your new implementation:
with pkgType
= binary
and renv.nexus.enabled
option set to TRUE
: OK
with pkgType
= source
and renv.nexus.enabled
option set to TRUE
: OK
with pkgType
= binary
and renv.nexus.enabled
option unset (default to FALSE
) : /!\ KO (retrieve was performed from Archive)
with pkgType
= source
and renv.nexus.enabled
option unset (default to FALSE
) : /!\ KO (retrieve was performed from Archive)
I spent some time debugging and found the problem:
Nexus serveur throws a 404 status code when you curl
with HEAD parameter, see renv-headers temp file:
HTTP/1.1 404 Not Found
Date: Fri, 16 Sep 2022 19:54:00 GMT
Server: Nexus/3.25.1-04 (OSS)
X-Content-Type-Options: nosniff
Content-Security-Policy: sandbox allow-forms allow-modals allow-popups allow-presentation allow-scripts allow-top-navigation
X-XSS-Protection: 1; mode=block
Pragma: no-cache
Cache-Control: no-cache, no-store, max-age=0, must-revalidate, post-check=0, pre-check=0
Expires: 0
X-Frame-Options: DENY
Content-Type: text/html
Content-Length: 2071
Set-Cookie: e1c2a849e31cf572844da4b9bd2d0f31=bd4f901c2a2c2c04b50478be164e36fa; path=/; HttpOnly
When removing HEAD parameter from curl
configuration file, Nexus server throws a 200 status code.
The full page is served; this is very small data when repo is Nexus, since basically the page tells you:
This r group repository is not directly browseable at this URL.
Please use the [browse] or [HTML index] views to inspect the contents of this repository.
HTTP/1.1 200 OK
Date: Fri, 16 Sep 2022 20:10:58 GMT
Server: Nexus/3.25.1-04 (OSS)
X-Content-Type-Options: nosniff
Content-Security-Policy: sandbox allow-forms allow-modals allow-popups allow-presentation allow-scripts allow-top-navigation
X-XSS-Protection: 1; mode=block
Content-Type: text/html
Content-Length: 2403
Set-Cookie: e1c2a849e31cf572844da4b9bd2d0f31=bd4f901c2a2c2c04b50478be164e36fa; path=/; HttpOnly
Cache-control: private
<!DOCTYPE html>
<html lang="en">
<head>
<title>Repository - Nexus Repository Manager</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<!--[if lt IE 9]>
<script>(new Image).src="https://**********************.fr/favicon.ico?3.25.1-04"</script>
<![endif]-->
<link rel="icon" type="image/png" href="https://**********************.fr/favicon-32x32.png?3.25.1-04" sizes="32x32">
<link rel="mask-icon" href="https://**********************.fr/safari-pinned-tab.svg?3.25.1-04" color="#5bbad5">
<link rel="icon" type="image/png" href="https://**********************.fr/favicon-16x16.png?3.25.1-04" sizes="16x16">
<link rel="shortcut icon" href="https://**********************.fr/favicon.ico?3.25.1-04">
<meta name="msapplication-TileImage" content="https://**********************.fr/mstile-144x144.png?3.25.1-04">
<meta name="msapplication-TileColor" content="#00a300">
<link rel="stylesheet" type="text/css" href="https://**********************.fr/static/css/nexus-content.css?3.25.1-04"/>
</head>
<body>
<div class="nexus-header">
<a href="https://**********************.fr">
<div class="product-logo">
<img src="https://**********************.fr/static/images/nexus.png?3.25.1-04" alt="Product logo"/>
</div>
<div class="product-id">
<div class="product-id__line-1">
<span class="product-name">Nexus Repository Manager</span>
</div>
<div class="product-id__line-2">
<span class="product-spec">OSS 3.25.1-04</span>
</div>
</div>
</a>
</div>
<div class="nexus-body">
<div class="content-header">
<img src="https://**********************.fr/static/rapture/resources/icons/x32/database.png?3.25.1-04" alt="Repository image"/>
<span class="title">Repository</span>
<span class="description">r</span>
</div>
<div class="content-body">
<div class="content-section">
<p>
This r group repository is not directly browseable at this URL.
</p>
<p>
Please use the <a href="https://**********************.fr/#browse/browse:r">browse</a>
or <a href="https://**********************.fr/service/rest/repository/browse/r/">HTML index</a>
views to inspect the contents of this repository.
</p>
</div>
</div>
</body>
</html>
I don't know if there is another option than HEAD to limit the amount of data to be served when requesting an URL. If there is no option, maybe we could store the result of the renv_nexus_enabled()
function for a given repo, so that the function is not triggered for all packages to be restored.
In your implemention, you seem to prefer curl
method for downloads (see function renv_repos_info_impl()
).
On the Windows servers of my intensive computing grid, I had to set RENV_DOWNLOAD_FILE_METHOD = wininet
in the Renviron.site, since i did not manage to renv::install
packages with default settings. Do you think this could be a problem, for future renv
users using wininet
like me ? As far as i am concerned, I will set the renv.nexus.enabled
option to TRUE on theses machines, so it will be OK ;)
Kind regards
Arnaud
Thanks! I think it should be okay to just perform a regular web request at that endpoint; it's unlikely the returned data would be large from any typical CRAN mirror. It would also allow us to use arbitrary downloaders as well (so no need to force the use of curl
).
Loose ends should be tied up on the main branch now. Thanks for the feedback; fingers crossed that this gets us over the finish line!
I will give you feedback as soon as I test. Kind regards
Dear @kevinushey,
Thanks for your reply and for new improved implementation.
I tested renv@0.15.5-58
with 4 configurations:
pkgType
option set to binary
, renv.nexus.enabled
option unsetpkgType
option set to binary
, renv.nexus.enabled
option set to TRUE
pkgType
option set to source
, renv.nexus.enabled
option unsetpkgType
option set to source
, renv.nexus.enabled
option set to TRUE
on several Windows environments:
using 2 different methods:
Results are OK everywhere, with all 4 configurations, apart from a strange behavior, see below: | Environment | Method | Status |
---|---|---|---|
Labtop | RStudio, within a Rstudio project | OK | |
Labtop | R.exe run in command line, within a directory project | OK | |
VDI | RStudio, within a Rstudio project | OK | |
VDI | R.exe run in command line, within a directory project | OK | |
Server | R.exe run in command line, within a directory project | OK - but strange behavior, see below |
When restoring a renv project on Windows Server (using R.exe run from command line):
To make sure, I tested with renv@0.15.5
CRAN version, and I confirm this stange behavior does not occur with released version: empty NULL directory is not created when restoring a renv project using renv@0.15.5
.
I don't think it is related to recent developpements dealing with Nexus issue. Probably another developpment made between renv@0.15.5
and renv@0.15.5-58
?
Would you have any question, please contact me. Kind regards Arnaud
P.S. :
Great news -- thanks for taking the time to test.
Do you already have a rough idea concerning next CRAN release date, including this new Nexus feature ?
I'm hoping to prepare a new release in the coming weeks.
In the meantime, can I consider that "renv.nexus.enabled" is the definitive option name ? (I'm currently preparing Renviron.site Rprofile.site configuration files for deployment in production in my company)
Yes, we can consider the option here stable.
I don't think it is related to recent developpements dealing with Nexus issue. Probably another developpment made between renv@0.15.5 and renv@0.15.5-58 ?
Thanks for the heads up here -- I'll see if I can figure out where this is coming from.
Regarding the NULL
directory, it might be helpful if you could also test with code of the following form:
trace(dir.create, quote({
if (grepl("NULL", path)) { print(rlang::trace_back()) }
}))
(please also make sure rlang
is also installed)
That might give a hint as to where that directory is coming from.
My only other guess is that this could be related to us setting R_LIBS_USER
and R_LIBS_SITE
here:
https://github.com/rstudio/renv/blob/630d5effa65f4dc9ce8b523f365f115a05886e87/R/r.R#L10-L13
Maybe something is auto-creating those directories?
Dear @kevinushey, Your last guess is the good one: I indeed deployed such code on my server:
R_LIBS_SITE = D:/R/R_LIBS_SITE/%p-library/%v
local({
R_LIBS_SITE <- Sys.getenv("R_LIBS_SITE")
if (!dir.exists(R_LIBS_SITE)) {
dir.create(R_LIBS_SITE, recursive = TRUE)
}
})
I added this code since otherwise, from what I understand, R does not take into account R_LIBS_SITE
when the corresponding directory does not exists. And I really need to specify a site library.
To be sure this auto-creation is responsible for the NULL directory, I've just added the same kind of parameter in my VDI environment:
R_LIBS_USER = U:/AppData/Local/R/%p-library/%v
local({
R_LIBS_USER <- Sys.getenv("R_LIBS_USER")
if (!dir.exists(R_LIBS_USER)) {
dir.create(R_LIBS_USER, recursive = TRUE)
}
})
This configuration now leads to the same strange behavior on my VDI: a NULL directory is created.
This configuration was not already set on VDI when I performed the tests this morning. I planned to do it, since otherwise, from what I understand, R does not take into account R_LIBS_USER
when the corresponding directory does not exists. And I really need to specify a user library. This is even more necessary on my VDI configuration, since I do deploy hundreads of VDIs, and I need to store user data in a shared network dedicated to every user (mapped on the U: drive), rather than in C:/Users/... of the VDI).
Do you know if there is another way to auto-create those directories ? Otherwise, to you think you can adjust renv behavior, so that it does not create the NULL directory ?
Kind regards Arnaud
The R documentation suggests that R_LIBS_USER
and R_LIBS_SITE
can be set to NULL
if you'd like them to be ignored or set as empty; e.g.
And those NULL
values get handled by R's built-in base Rprofile
, e.g. for Unix:
In this case, I believe renv
is doing the right thing; I think you need to validate that R_LIBS_USER
and R_LIBS_SITE
are not equivalent to NULL
before choosing to create them.
Dear @kevinushey, Thanks to your advice, I adjusted my Rprofile.site files as below. It's now OK : the undue directory creation no longer occurs. I think we can consider this issue #1074 as ready to be closed. Many thanks for your help - I really appreciated our collaboration ! Kind regards Arnaud
On server:
local({
R_LIBS_SITE <- Sys.getenv("R_LIBS_SITE", unset = "NULL")
if (R_LIBS_SITE != "NULL" & !dir.exists(R_LIBS_SITE)) {
dir.create(R_LIBS_SITE, recursive = TRUE)
}
})
On VDI:
local({
R_LIBS_USER <- Sys.getenv("R_LIBS_USER", unset = "NULL")
if (R_LIBS_USER != "NULL" & !dir.exists(R_LIBS_USER)) {
dir.create(R_LIBS_USER, recursive = TRUE)
}
})
Great, I'm glad to hear it! Thanks for taking the time to report back.
Many thanks @kevinushey for your very quick and positive answer !
To make it simple, Nexus (aka Nexus Repository Manager, by Sonatype) is deployed by my IT teams on our company's servers. Nexus aims at achieving two main goals:
In the Rprofile.site we deploy on our machines, the repository is set using the Nexus root URL for R repository (miror of CRAN). Thanks to this configuration:
The architecture of Nexus storage is the same as the original CRAN repository. For example :
The main difference with CRAN is that older versions are kept, and still available thanks to Nexus internal storage functionality.
Binary packages
Here is an example for binary package
rlang
in my Nexus architecture:As said above, older versions are still available for download, even if they are not explicitly listed in PACKAGES.gz (which only includes latest version since it is the proxyfied version of CRAN’s PACKAGES.gz file):
The idea would be to determine the theoretical URL and try such URL when restoring the project, if the requested package is not explicitly listed in PACKAGES.gz. In our example, if the renv.lock file contains the record
rlang@1.0.4
, we should try URL bin/windows/contrib/4.1/rlang_1.0.4.zip.Source packages
The same idea could also be used for source packages, in order to take advantage of Nexus storage functionnality. Here is an example for source package
rlang
in my Nexus architecture:Again, older versions are still available for download, even if they are not explicitly listed in PACKAGES.gz (which only includes latest version since it is the proxyfied version of CRAN’s PACKAGES.gz file):
I saw the code you wrote retrieve.R file, for
renv_retrieve_repos_archive_path()
function. I understand (but not 100% sure) that when requested package is neither in binary PACKAGES.gz file nor in source PACKAGES.gz, this function:(If voluntary ignore the step related to issue #602, to keep this post as simple as possible !)
If the package has been moved to Archive subfolder in CRAN, Nexus will download it and store it with the same architecture, in folder "src/contrib/Archive//".
At the end of the day, the same source package will be duplicated in Nexus storage : both in "src/contrib/" (initial download) and in "src/contrib/Archive//" (second download, when package is moved to Archive in CRAN) :
If we could avoid this, this would prevent from unnecessarily increase the storage volumetry. For Nexus’like configurations, I would suggest to try first in "src/contrib/", and then in "src/contrib/Archive//".
Conclusion
To put in a nutshell, my suggested sequence would be :
I don't know if repository managers like Nexus are largely used by R users. If you don't want to systematically try oldest versions, a user-level configuration to trigger steps 2 and 5 could make sense: For instance, using a new option
renv.config.retrieve.try.older
(orRENV_CONFIG_RETRIEVE_TRY_OLDER
as environment variable), with default toFALSE
.Would you have any question, please contact me. And if you prefer that I create an new issue on Github, please tell me. Kind regards Arnaud
Originally posted by @arnauddeblic in https://github.com/rstudio/renv/issues/595#issuecomment-1239255666