ropensci-review-tools / pkgcheck

Check whether a package is ready for submission to rOpenSci's peer-review system
https://docs.ropensci.org/pkgcheck/
18 stars 6 forks source link

check for Suggests dependencies usage #117

Closed maelle closed 2 years ago

maelle commented 2 years ago

In code, articles, vignettes etc.

https://github.com/ropensci-org/rodev/issues/38

mpadge commented 2 years ago

@maelle Following on from that link, would a check be something like this:

check_suggests <- function() {
  desc_deps <- attachment::att_from_description()
  scripts_deps <- attachment::att_from_rscripts(".")
  namespace_deps <- attachment::att_from_namespace()
  vignette_deps <- attachment::att_from_rmds()
  desc_deps[!desc_deps %in% c(scripts_deps, namespace_deps, vignette_deps)]
}

with a check then

check_pass = length(check_suggests() == 0L)

and a dump of any packages otherwise? I'm unsure of the utility of this, because it's perfectly acceptible to use all sorts of packages in scripts which do not appear in DESC, yet which would appear here. I might need a bit more advice and clarity from you about what you see this check achieving - thanks!

maelle commented 2 years ago

So, say I used to use xml2 in a vignette, and then stopped doing that, but forgot to remove xml2 from DESCRIPTION. I'd like to see some sort of warning.

mpadge commented 2 years ago

But what if you use xml2 in some eval = FALSE chunk in a vignette? Or in some other script file in a vignette directory, or anywhere else that was totally unrelated to package functionality? There might be perfectly valid reasons for that, but any auto-identification like that would simply flag that xml2 was used, and raise a warning. Which would then be inappropriate. Right?

maelle commented 2 years ago
mpadge commented 2 years ago

Regarding Suggests: Okay, I see the point of that, and I see how the code at the top captures exactly that situation. So maybe we could and should implement something along those lines? It would have to be different to that code, especially because att_from_namespace requires all namespaced packages to be loaded, which can't be done here. But i'm not sure that would really be worthwhile? I think maybe more just use pkgstats to dump output like the following, and instruct reviewers (and/or editors?) to examine the output? I think this is a better way of informing a considered check of package dependencies.

library (pkgstats)
packageVersion ("pkgstats")
#> [1] '0.0.3.89'
u <- "https://cran.r-project.org/src/contrib/ade4_1.7-18.tar.gz"
f <- file.path (tempdir (), basename (u))
download.file (u, f)
s <- pkgstats (f)
head (s$external_calls)
#>   tags_line       call          tag            file     kind start end package
#> 1        21         df      as.dudi        R/dudi.R function     1 106   stats
#> 2        22 match.call as.krandtest   R/krandtest.R function     1  63    base
#> 3        22      names as.krandtest   R/krandtest.R function     1  63    base
#> 4        22   colnames as.krandtest   R/krandtest.R function     1  63    base
#> 5        30        rep  bicenter.wt R/bicenter.wt.R function     1  21    base
#> 6        30       ncol  bicenter.wt R/bicenter.wt.R function     1  21    base
sort (table (s$external_calls$package), decreasing = TRUE)
#> 
#>       base       ade4      stats   graphics  grDevices      utils    methods 
#>       6190        939        522        360         31         16          9 
#>   progress    foreach      spdep    splancs   waveslim        ape doParallel 
#>          6          4          4          2          2          1          1 
#>   parallel     pixmap 
#>          1          1

Created on 2022-03-24 by the reprex package (v2.0.1)

The pkgcheck report could include the final table which then directly indicates packages which are only used for one function, and we'd then just explicitly say that pkgstats(path)$external_calls has details on exactly where those function calls are made. What do you think of that?

maelle commented 2 years ago

It sounds good!

If the table is very big should it be a DT html table stored somewhere, to have easy filtering?

mpadge commented 2 years ago

Nope - the tables are enormous, and shouldn't be stored anywhere. They can easily be generated locally. My suggestion is just to include the summary info from table(s$external_calls$package), giving numbers of distinct function calls to each dependency. I'll start a PR and request your review. Thanks!