russHyde / code_as_data

Analysis of code in R dev packages (for a planned talk)
9 stars 0 forks source link

Plan for presentation #1

Open russHyde opened 4 years ago

russHyde commented 4 years ago

Plan for newcastle satrdays abstract:

Code analysis tools:

How to combine all these things together, similar to the code-maat thing

Probably need to work on the visual representation of projects

russHyde commented 4 years ago

Related to russHyde/dupree#38 .

I ran dupree over a set of ten CRAN packages and it takes < 20 secs to analyse each on my computer.

TODO:

Doing this will almost certainly find some dupree-breakers: either things that take too long to be analysed, or break the code parsing.

Preferred:

russHyde commented 4 years ago

Want github repo for each CRAN package on https://github.com/ropensci/PackageDevelopment

We can get all CRAN DESCRIPTION data using the following (updated from Julia Silge's blog: https://juliasilge.com/blog/mining-cran-description/):

library(dplyr)
library(tibble)
cran <- tools::CRAN_package_db()
# the returned data frame has two columns with the same name???
cran <- cran[,-65]
# make it a tibble
cran <- as_tibble(cran)
cran
# There are ~ 5.5k packages that are hosted on github
sum(grepl("github", cran$URL) | grepl("github", cran$BugReports))
[1] 5641
cran_gh <- filter(cran, grepl("github", URL) | grepl("github", BugReports))

Note you could get the github URL for sites directly from the ropensci markdown, but some of those packages will have been dropped from CRAN by now

Can get the packages mentioned in the task view from https://github.com/ropensci/PackageDevelopment/blob/master/PackageDevelopment.ctv

Would need to mine it using xml2, for example Each package is mentioned in the value of a <pkg>...</pkg> tag

russHyde commented 4 years ago
library(xml2)
xml_path <- file.path("https://raw.githubusercontent.com/ropensci/PackageDevelopment/master/PackageDevelopment.ctv")
xml_data <- xml2::read_xml(xml_path)
dev_pkgs <- xml_text(xml_find_all(xml_data, "packagelist/pkg"))
russHyde commented 4 years ago
# 113 packages are still on CRAN
length(intersect(dev_pkgs, cran$Package))

# 82 packages are on CRAN and have a github repo
dev_cran_gh <- filter(cran_gh, Package %in% dev_pkgs)
dim(dev_cran_gh)
russHyde commented 4 years ago
# as of today:
dev_cran_gh$Package
 [1] "aoos"           "aprof"          "argparse"       "assertr"        "available"     
 [6] "backports"      "badgecreatr"    "checkmate"      "checkpoint"     "CodeDepends"   
[11] "covr"           "cranly"         "devtools"       "docopt"         "drat"          
[16] "ensurer"        "formatR"        "functools"      "GetoptLong"     "getPass"       
[21] "git2r"          "gitlabr"        "GRANBase"       "gWidgets2"      "htmlwidgets"   
[26] "hunspell"       "import"         "inline"         "js"             "knitr"         
[31] "later"          "lintr"          "log4r"          "logging"        "matlabr"       
[36] "microbenchmark" "miniCRAN"       "mockr"          "optigrab"       "packagedocs"   
[41] "packrat"        "pacman"         "pipeR"          "pkgconfig"      "pkgdown"       
[46] "pkggraph"       "pkgmaker"       "pkgnet"         "prof.tree"      "profmem"       
[51] "profr"          "progress"       "proto"          "purrr"          "R.oo"          
[56] "R6"             "rcmdcheck"      "Rcpp"           "Rd2roxygen"     "RDocumentation"
[61] "Rdpack"         "remotes"        "reticulate"     "rhub"           "RInside"       
[66] "rJava"          "rlang"          "roxygen2"       "rscala"         "RStata"        
[71] "rstudioapi"     "rtype"          "semver"         "shiny"          "skeletor"      
[76] "sys"            "testit"         "testthat"       "unitizer"       "V8"            
[81] "vdiffr"         "withr"
russHyde commented 4 years ago

Note that some of the above have multiple entries in their URL / BugReports entries (seems funny that gitlabr is hosted on github....)

TODO:

russHyde commented 4 years ago

see branch analyse-dev-tools

russHyde commented 4 years ago

? perhaps split this out into a separate repo since it's bigger than a single-script analysis; could use drake

russHyde commented 4 years ago

Moved this subjob to separate repo: code_as_data