sonatype-nexus-community / nexus-repository-r

R, v data science, much functional programming, doge
Eclipse Public License 1.0
31 stars 17 forks source link

Package file generation does not give the latest version #12

Closed screig closed 7 years ago

screig commented 7 years ago

Hi

So I have three packages in my hosted repo (at the moment, Jenkins pumps out a new one on each build) as follows:

image

curl -O http://**myip**:**myport**/repository/**myreponame**/bin/windows/contrib/3.4/PACKAGES.gz # the PACKAGES.gz file always shows the lowest version of the package not the highest:

Package: rClr
Version: 0.7-2
Depends: R (>= 3.1.0), methods
Suggests: testthat, knitr, xtable
License: LGPL-3

Package: rYieldLib
Version: 1.0.2017.154
Depends: R (>= 3.0.0), rClr, zoo, lubridate
Suggests: testthat, knitr, rmarkdown
NeedsCompilation: no

should be

Package: rYieldLib Version: 1.0.2017.155

fjmilens3 commented 7 years ago

@screig, it's been a while since I've touched a lot of this code but from what I remember, I'm not sure if we ever implemented any kind of sorting/ordering on the versions in the PACKAGES list. If anything, I think I'm more amazed that you're not seeing all versions of it in there. :smile:

Is that the full contents of your PACKAGES.gz?

DarthHater commented 7 years ago

Same with @fjmilens3 , code seems to show that it would have all versions, not just one. If you can share a bit more of your PACKAGES.gz file that would be awesome @screig

fjmilens3 commented 7 years ago

@screig, @DarthHater, here's what's going on:

https://github.com/sonatype/nexus-repository-r/blob/master/src/main/java/org/sonatype/nexus/repository/r/internal/RHostedFacetImpl.java#L89

We're tracking whether we've seen that package before or not in this location (so we only get one entry), but we're not comparing or ordering to get the highest version.

fjmilens3 commented 7 years ago

This might be helpful, if accurate:

https://stat.ethz.ch/R-manual/R-devel/library/utils/html/compareVersion.html

http://r-pkgs.had.co.nz/release.html#release-version

Is that the behavior you'd expect in terms of a version comparison, @screig?

/cc @DarthHater

screig commented 7 years ago

Hi

Complete file That was the complete package file, it should only have two entries.

Expected behaviour I also implemented a repo the old fashioned way with a web server and a shared folder. In this one after I uploaded the file I ran the write_PACKAGES function from the tools library. Here is the R code to do that.

file.copy(from=zip_file, to=repo_path, overwrite = TRUE, recursive = FALSE, copy.mode = TRUE)
tools::write_PACKAGES(dir = repo_path, type = c("win.binary"))

here is what is in this repo contained, the last number of each of these corresponds to a Jenkins build number

image

and here is the full contents of the package file

image

Here you can see that the write_PACKAGES function has identified the latest version and set only the latest version in the PACKAGES file. I take this as the gold standard of what it should do. Maybe I shouldn't but in the absence of a detailed doc from CRAN on how it should work I think it is a fair standard?

Thanks again, for taking the time to look at this. Sean

PS here is the code for the write_PACKAGES function.

and a snippet:

.remove_stale_dups <-
function(ap)
{
    ## Given a matrix from available.packages, return a copy
    ## with no duplicate packages, being sure to keep the packages
    ## with highest version number.
    ## (Also works for data frame package repository dbs.)
    pkgs <- ap[ , "Package"]
    dup_pkgs <- pkgs[duplicated(pkgs)]
    stale_dups <- integer(length(dup_pkgs))
    i <- 1L
    for (dp in dup_pkgs) {
        wh <- which(dp == pkgs)
        vers <- package_version(ap[wh, "Version"])
        keep_ver <- max(vers)
    keep_idx <- which.max(vers == keep_ver) # they might all be max
        wh <- wh[-keep_idx]
        end_i <- i + length(wh) - 1L
        stale_dups[i:end_i] <- wh
        i <- end_i + 1L
    }
    ## Possible to have only one package in a repository
    if(length(stale_dups)) ap[-stale_dups, , drop = FALSE] else ap
}
DarthHater commented 7 years ago

@screig to get this right, we need to know exactly how max() works for a lot of cases. If you can help us figure that out, I think that would be a good start!

screig commented 7 years ago

Well the code for that is non-trivial, the R code calls into C

in this file there is a redirection....

/* Data Summaries */
/* these four are group generic and so need to eval args */
{"sum",     do_summary, 0,  1,  -1, {PP_FUNCALL, PREC_FN,   0}},
{"min",     do_summary, 2,  1,  -1, {PP_FUNCALL, PREC_FN,   0}},
{"max",     do_summary, 3,  1,  -1, {PP_FUNCALL, PREC_FN,   0}},
{"prod",    do_summary, 4,  1,  -1, {PP_FUNCALL, PREC_FN,   0}},

to this file, where max is evaluated in this function (line 397):

SEXP attribute_hidden do_summary(SEXP call, SEXP op, SEXP args, SEXP env)
{

I think it may be easier if you give me all the use cases u want and I give you the results?

fjmilens3 commented 7 years ago

@DarthHater and @jlstephens89, I had a bit of time after we returned from car-shopping and started playing around with this whilst savoring taco boats in the comfort of our cottage. I have not done the kind of rigorous testing I would usually like, but https://github.com/sonatype/nexus-repository-r/pull/18 is here for discussion.

If we go that route (we're going to have to do something like that anyway), we should also be considering how to not regenerate this on each request (it's bad form anyway, and this overhead will just exacerbate the problem). I would suggest we dive in with a volunteering spirit and try to use a similar approach as Yum Hosted so that we can make this actually work correctly.

DarthHater commented 7 years ago

@screig we just merged this in, maybe take a gander and let us know how it works?

screig commented 7 years ago

@DarthHater @fjmilens3 Very grateful for your work on this. In terms of testing, does this still stand:

Copy the bundle into /system/org/sonatype/nexus/plugins/nexus-repository-r/1.0.0/nexus-repository-r-1.0.0.jar

or have you incremented the version number?

Thanks Sean

DarthHater commented 7 years ago

Right now I left it as 1.0.0, I'll likely increment once we are like yep, this is working, so your feedback would be great @screig :)

screig commented 7 years ago

So should I just overwrite the jar with the new one, would that do it?

screig commented 7 years ago

ok it works!

Here is what is in the repo, where the latest version ends 193

image

Before updating the jar

C:_TEMP>more +2 PACKAGES Depends: R (>= 3.1.0), methods Suggests: testthat, knitr, xtable License: LGPL-3

Package: rYieldLib Version: 1.0.2017.157 Depends: R (>= 3.0.0), rClr, zoo, lubridate Suggests: testthat, knitr, rmarkdown License: Yield Solutions NeedsCompilation: no

after updating and restarting the server

C:_TEMP>more +2 PACKAGES Depends: R (>= 3.1.0), methods Suggests: testthat, knitr, xtable License: LGPL-3

Package: rYieldLib Version: 1.0.2017.193 Depends: R (>= 3.0.0), rClr, zoo, lubridate Suggests: testthat, knitr, rmarkdown License: Yield Solutions NeedsCompilation: no

screig commented 7 years ago

PS thanks again for taking the time too look and fix this, greatly appreciated!

fjmilens3 commented 7 years ago

Glad we could help, @screig. We still want to make some additional changes along these lines before we do another release of the plugin, but you should be okay using this as-is for now, so long as you are aware that it's not necessarily going to work the best at scale (because of the rebuilding of the metadata on each request).