ropensci-review-tools / pkgcheck

Check whether a package is ready for submission to rOpenSci's peer-review system
https://docs.ropensci.org/pkgcheck/
18 stars 6 forks source link

Detect code duplication with {dupree} #124

Closed assignUser closed 2 years ago

assignUser commented 2 years ago

{dupree} works quite well, though I am not sure if it is still actively maintained.

mpadge commented 2 years ago

Checking code duplication is in principle a good idea, but it is definitely something that can only safely be done through subjective personal reviews. This package is a really good example - it is full of functions which all follow the same basic template, yet which all have to be independently implemented to provide the individual plug-in tests. Running {dupree} gives this:

setwd ("/data/mega/code/repos/ropensci-review-tools/pkgcheck")
library (dupree)
dupree_package ()
#> # A tibble: 48 × 7
#>    file_a                        file_b      block_a block_b line_a line_b score
#>    <chr>                         <chr>         <int>   <int>  <int>  <int> <dbl>
#>  1 ./R/github.R                  ./R/github…      21      34     99    128 0.756
#>  2 ./R/pkgcheck-methods.R        ./R/summar…      19      14    293     63 0.689
#>  3 ./R/pkgcheck-methods.R        ./R/pkgche…      17      18    224    252 0.590
#>  4 ./R/check-covr.R              ./R/check-…       1       7      2     29 0.484
#>  5 ./R/checks-goodpractice.R     ./R/checks…      42      44    277    393 0.474
#>  6 ./R/pkgcheck-fn.R             ./R/summar…      34      14    215     63 0.453
#>  7 ./R/check-pkgname-available.R ./R/checks…       7      14     29     96 0.420
#>  8 ./R/check-fns-have-exs.R      ./R/checks…       9      14     46     96 0.410
#>  9 ./R/check-fns-have-exs.R      ./R/check-…       9      16     46     29 0.403
#> 10 ./R/checks-goodpractice.R     ./R/checks…      43      44    312    393 0.382
#> # … with 38 more rows

Created on 2022-01-21 by the reprex package (v2.0.1.9000)

The first result has over 70% duplication for these two lines: https://github.com/ropensci-review-tools/pkgcheck/blob/60958ce33dd82f4a8a69229afd5160304599d3d5/R/github.R#L99 and https://github.com/ropensci-review-tools/pkgcheck/blob/60958ce33dd82f4a8a69229afd5160304599d3d5/R/github.R#L128 Any way of deriving whole-back metrics from those scores would rate this package very high, yet that would in this case be expected, and is perfectly okay. Code duplication always requires careful, subjective judgement. But thanks for the suggestion regardless