Closed elong0527 closed 4 years ago
Find below a simple script to scrape abitrary download stats from cranlog for a CRAN package.
Only dependency is jsonlite which can me removed as well as it just transforms the JSON repsonse into a dataframe.
As already mentioned, there is a package available who does all this already: https://cranlogs.r-pkg.org/#rpackage
`# Which package, CRAN only pack_name <- "DoseFinding"
require(jsonlite,quietly=TRUE) # to get download stats via fromJSON() and cranlogs
today <- Sys.Date() # today's data
duration <- 180 # 180 days equal 6 months
url4 <- paste("https://cranlogs.r-pkg.org/downloads/total/",today-duration,":", today,"/",pack_name,sep="")
input_file4 <- tempfile()
last_month_exists <- 1 # init value
last_month <- tryCatch({download.file(url4, input_file4)},
warning = function(w) {last_month_exists<-99},
error = function(e) {cat("error")
last_month_exists<-9}
)
if (last_month_exists == 1) {
last_month_data <- format(fromJSON(input_file4), big.mark = ",")
last_month_text <- paste("The package has been downloaded", last_month_data["downloads"], "times in the last", duration,"days (between",last_month_data["start"], "to",last_month_data["end"],").")
} else {last_month_text <- paste("No detailed download data is available for last month.")}
from_date <- "2012-10-01" # cranlogs started October 2012
to_date <- Sys.Date() # current data
url5 <- paste("https://cranlogs.r-pkg.org/downloads/total/", from_date,":", to_date,"/", pack_name,sep="")
input_file5 <- tempfile()
total_data_exists <- 1 # init value
total_data <- tryCatch({download.file(url5, input_file5)},
warning = function(w) {total_data_exists<-99},
error = function(e) {cat("error")
total_data_exists<-9}
)
if (total_data_exists == 1) {
total_data <- format(fromJSON(input_file5), big.mark = ",")
total_data_text <- paste("The package has been downloaded a total of", total_data["downloads"], " times from the Rstudio servers since the beginning of cranlogs in October 2012.")
} else {total_data_text <- paste("No total download data is available.")}
paste(last_month_text, total_data_text) `
For Bioconductor, one can use the above mention function biocDownloadStats().
However, all it does is read in the following table: http://bioconductor.org/packages/stats/bioc/bioc_pkg_stats.tab as one can see i.e. here https://rdrr.io/bioc/BiocPkgTools/src/R/biocDownloadStats.R
Consequently, stats can be derived quite simple:
`# for bioconductor require(dplyr)
bioc_downloadsats = read.table('http://bioconductor.org/packages/stats/bioc/bioc_pkg_stats.tab', sep="\t", header = TRUE)
total_stats <- bioc_downloadsats %>% group_by(Package) %>% summarise("Total Downloads" = sum(Nb_of_downloads))
bio_pack <- "limma"
subset(bioc_downloadsats, Package == bio_pack, select=Nb_of_downloads) %>% sum()
subset(bioc_downloadsats, Package == bio_pack & Year == 2018, select=Nb_of_downloads) %>% sum() `
Thanks @matthiazzz, this is very helpful. It seems like we can reduce our dependency footprint by querying the data from the source files directly.
For the time being, I think it might be in our interest to not worry so much about dependencies. It's probably easiest if we focus on the simplest viable mechanisms for now just to experiment quickly. For the time being, I think that using existing package solutions might keep our code logic more immediately obvious as we get the project off the ground. Inevitably some of the dependencies will be a bit heavy for what functionality we need and we can migrate to lighter-weight solutions.
I'm open to alternative approaches, and the decision of when to move from 'experimental' to 'production' is always a tricky one to self identify - how do others feel? Were there other considerations that might drive us to avoid these dependencies?
a small side note; you can use the syntax below for multi-line R code blocks in github-flavored markdown.
```r
<your code>
@matthiazzz , I have provided you the write permission of this repo. Would you want to enhance the R function based on your summary here and create a pull request?
will create another issue for biocondocter downloads.
We need to define the number of downloads measurement for each R package.
dlstats
R pacakge)BiocPkgTools::biocDownloadStats()
Discussion point:
In this R package, should we consider to develop this R package in a low risk category? (that means we should be careful to chose which R package to be used to derive the metrics) This is not in high priority at current stage.