ropensci-review-tools / pkgcheck

Check whether a package is ready for submission to rOpenSci's peer-review system
https://docs.ropensci.org/pkgcheck/
18 stars 6 forks source link

Could pkgstats parse dependencies? #72

Closed maelle closed 3 years ago

maelle commented 3 years ago

I.e. each column would be not a string but instead a list to make it easier to retrieve the vector of deps.

maelle commented 3 years ago

also when there is no dependency of a given type could it be a NULL instead of a NA?

mpadge commented 3 years ago

Short answer: No and no. These things have to be returned as single character strings to (easily) enable data from multiple packages to be combined with simple rbind. That could be done with I(as.list(...)), but (1) gets messy; and (2) with 120,000 rows and now > 90 columns, the increase in memory footprint would be considerable. For the same reason of rbind-ability, no values can be empty (NULL or character(0)), but have to have non-zero size, hence NA is necessary. Does that make sense?

maelle commented 3 years ago

yep, so this parsing should happen in pkgcheck, correct?

mpadge commented 3 years ago

Yes, correct. Another reason they're like that is because it's the direct output of data.frame(read.dcf("DESCRIPTION"))$Imports), and pkgstats tries to get information out as efficiently as possible, with no unnecessary post-processing. This is the kind of price such post-processing has to pay:

bench::mark (
             read.dcf("DESCRIPTION")[1,"Imports"],
             data.frame(read.dcf("DESCRIPTION"))$Imports,
             desc::desc_get_deps("."),
             check = FALSE)
#> # A tibble: 3 × 6
#>   expression                                       min   median `itr/sec`
#>   <bch:expr>                                  <bch:tm> <bch:tm>     <dbl>
#> 1 read.dcf("DESCRIPTION")[1, "Imports"]       245.75µs 272.87µs     3513.
#> 2 data.frame(read.dcf("DESCRIPTION"))$Imports 365.14µs 402.42µs     2269.
#> 3 desc::desc_get_deps(".")                      5.43ms   6.78ms      134.
#> # … with 2 more variables: mem_alloc <bch:byt>, gc/sec <dbl>

Created on 2021-09-24 by the reprex package (v2.0.1.9000)

Extracting in current form is over 1,000 times more efficient!

mpadge commented 3 years ago

Moved to pkgcheck because dependencies can indeed be parsed here. The main object returned by pkgcheck currently does not have a list of dependencies, but that would be both good and important to include. @maelle My suggestion will be to add an extra item to the main pgkcheck() output called "description", that has parsed values of the main pkgstats()$desc output:

library (pkgstats)
path <- "/<path>/<to>/lme4_1.1-26.tar.gz"
s <- pkgstats (path)
s$desc
#>   package version                date    license                          urls
#> 1    lme4  1.1-26 2020-12-01 00:50:10 GPL (>= 2) https://github.com/lme4/lme4/
#>                                  bugs aut ctb fnd rev ths trl depends
#> 1 https://github.com/lme4/lme4/issues   4   9   0   0   0   0  Matrix
#>                                                                                       imports
#> 1 graphics, grid, splines, utils, parallel, MASS, lattice, boot, nlme, minqa, nloptr, statmod
#>                                                                                                                        suggests
#> 1 knitr, rmarkdown, PKPDmodels, MEMSS, testthat, ggplot2, mlmRev, optimx, gamm4, pbkrtest, HSAUR3, numDeriv, car, dfoptim, mgcv
#>        linking_to
#> 1 Rcpp, RcppEigen

Created on 2021-09-28 by the reprex package (v2.0.1.9000)

That info is not part of any actual check, but is pretty important meta-information that can and should be stored with the pkgcheck output regardless.

maelle commented 3 years ago

Or it could be a desc object? (not sure it's a good suggestion at all).

mpadge commented 3 years ago

pkgcheck()$desc now gives the same result as desc::desc_get_deps(), but without the version numbers, so just 2 columns of type and package. This information is ignored in summary and print methods, but is in the return object for anyone who wants access to it.