Closed screig closed 7 years ago
@screig, it's been a while since I've touched a lot of this code but from what I remember, I'm not sure if we ever implemented any kind of sorting/ordering on the versions in the PACKAGES list. If anything, I think I'm more amazed that you're not seeing all versions of it in there. :smile:
Is that the full contents of your PACKAGES.gz
?
Same with @fjmilens3 , code seems to show that it would have all versions, not just one. If you can share a bit more of your PACKAGES.gz
file that would be awesome @screig
@screig, @DarthHater, here's what's going on:
We're tracking whether we've seen that package before or not in this location (so we only get one entry), but we're not comparing or ordering to get the highest version.
This might be helpful, if accurate:
https://stat.ethz.ch/R-manual/R-devel/library/utils/html/compareVersion.html
http://r-pkgs.had.co.nz/release.html#release-version
Is that the behavior you'd expect in terms of a version comparison, @screig?
/cc @DarthHater
Hi
Complete file That was the complete package file, it should only have two entries.
Expected behaviour I also implemented a repo the old fashioned way with a web server and a shared folder. In this one after I uploaded the file I ran the write_PACKAGES function from the tools library. Here is the R code to do that.
file.copy(from=zip_file, to=repo_path, overwrite = TRUE, recursive = FALSE, copy.mode = TRUE)
tools::write_PACKAGES(dir = repo_path, type = c("win.binary"))
here is what is in this repo contained, the last number of each of these corresponds to a Jenkins build number
and here is the full contents of the package file
Here you can see that the write_PACKAGES function has identified the latest version and set only the latest version in the PACKAGES file. I take this as the gold standard of what it should do. Maybe I shouldn't but in the absence of a detailed doc from CRAN on how it should work I think it is a fair standard?
Thanks again, for taking the time to look at this. Sean
PS here is the code for the write_PACKAGES function.
and a snippet:
.remove_stale_dups <-
function(ap)
{
## Given a matrix from available.packages, return a copy
## with no duplicate packages, being sure to keep the packages
## with highest version number.
## (Also works for data frame package repository dbs.)
pkgs <- ap[ , "Package"]
dup_pkgs <- pkgs[duplicated(pkgs)]
stale_dups <- integer(length(dup_pkgs))
i <- 1L
for (dp in dup_pkgs) {
wh <- which(dp == pkgs)
vers <- package_version(ap[wh, "Version"])
keep_ver <- max(vers)
keep_idx <- which.max(vers == keep_ver) # they might all be max
wh <- wh[-keep_idx]
end_i <- i + length(wh) - 1L
stale_dups[i:end_i] <- wh
i <- end_i + 1L
}
## Possible to have only one package in a repository
if(length(stale_dups)) ap[-stale_dups, , drop = FALSE] else ap
}
@screig to get this right, we need to know exactly how max() works for a lot of cases. If you can help us figure that out, I think that would be a good start!
Well the code for that is non-trivial, the R code calls into C
in this file there is a redirection....
/* Data Summaries */
/* these four are group generic and so need to eval args */
{"sum", do_summary, 0, 1, -1, {PP_FUNCALL, PREC_FN, 0}},
{"min", do_summary, 2, 1, -1, {PP_FUNCALL, PREC_FN, 0}},
{"max", do_summary, 3, 1, -1, {PP_FUNCALL, PREC_FN, 0}},
{"prod", do_summary, 4, 1, -1, {PP_FUNCALL, PREC_FN, 0}},
to this file, where max is evaluated in this function (line 397):
SEXP attribute_hidden do_summary(SEXP call, SEXP op, SEXP args, SEXP env)
{
I think it may be easier if you give me all the use cases u want and I give you the results?
@DarthHater and @jlstephens89, I had a bit of time after we returned from car-shopping and started playing around with this whilst savoring taco boats in the comfort of our cottage. I have not done the kind of rigorous testing I would usually like, but https://github.com/sonatype/nexus-repository-r/pull/18 is here for discussion.
If we go that route (we're going to have to do something like that anyway), we should also be considering how to not regenerate this on each request (it's bad form anyway, and this overhead will just exacerbate the problem). I would suggest we dive in with a volunteering spirit and try to use a similar approach as Yum Hosted so that we can make this actually work correctly.
@screig we just merged this in, maybe take a gander and let us know how it works?
@DarthHater @fjmilens3 Very grateful for your work on this. In terms of testing, does this still stand:
Copy the bundle into
/system/org/sonatype/nexus/plugins/nexus-repository-r/1.0.0/nexus-repository-r-1.0.0.jar
or have you incremented the version number?
Thanks Sean
Right now I left it as 1.0.0, I'll likely increment once we are like yep, this is working, so your feedback would be great @screig :)
So should I just overwrite the jar with the new one, would that do it?
ok it works!
Here is what is in the repo, where the latest version ends 193
Before updating the jar
C:_TEMP>more +2 PACKAGES Depends: R (>= 3.1.0), methods Suggests: testthat, knitr, xtable License: LGPL-3
Package: rYieldLib Version: 1.0.2017.157 Depends: R (>= 3.0.0), rClr, zoo, lubridate Suggests: testthat, knitr, rmarkdown License: Yield Solutions NeedsCompilation: no
after updating and restarting the server
C:_TEMP>more +2 PACKAGES Depends: R (>= 3.1.0), methods Suggests: testthat, knitr, xtable License: LGPL-3
Package: rYieldLib Version: 1.0.2017.193 Depends: R (>= 3.0.0), rClr, zoo, lubridate Suggests: testthat, knitr, rmarkdown License: Yield Solutions NeedsCompilation: no
PS thanks again for taking the time too look and fix this, greatly appreciated!
Glad we could help, @screig. We still want to make some additional changes along these lines before we do another release of the plugin, but you should be okay using this as-is for now, so long as you are aware that it's not necessarily going to work the best at scale (because of the rebuilding of the metadata on each request).
Hi
So I have three packages in my hosted repo (at the moment, Jenkins pumps out a new one on each build) as follows:
curl -O http://**myip**:**myport**/repository/**myreponame**/bin/windows/contrib/3.4/PACKAGES.gz # the PACKAGES.gz file always shows the lowest version of the package not the highest:
should be
Package: rYieldLib Version: 1.0.2017.155