Open pbreheny opened 8 years ago
Good idea. I don't think it is difficult to implement. You want to help with it? :)
A new SQL (plpgsql) procedure is needed here: https://github.com/metacran/cranlogs.app/blob/master/db/proc.sql
Hmm...well, I'm not sure I know enough SQL/JSON to be of much help. Algorithmically, it would seem to require:
cran_downloads
on that list2 and 3 are straightforward. 1 is clearly possible, but I wouldn't know how to do it through the SQL/JSON interface. Or perhaps there's a more efficient approach than all this?
EDIT 2021-11-30: Answer to a different question below ... (I've updated it to say fraction
instead of quantile
)
Since you can get the total download count for all packages by passing packages = NULL
("... for a sum of downloads for all packages."), you could use that for your denominator. Here's the gist:
cran_download_fraction <- function(packages, ...) {
counts <- cranlogs::cran_downloads(packages = packages, ...)
total <- cranlogs::cran_downloads(packages = NULL, ...)
z <- lapply(total$date, FUN = function(.date) {
x <- subset(counts, date == .date)
y <- subset(total, date == .date)
x$fraction <- x$count / y$count
x[, c("date", "count", "fraction", "package")]
})
z <- do.call(rbind, z)
rownames(z) <- NULL
z
}
Example:
pkgs <- c("rlang", "digest")
stats <- cran_download_fraction(pkgs, from = "2021-11-10", to = "2021-11-12")
stats
#> date count fraction package
#> 1 2021-11-10 86060 0.010044005 rlang
#> 2 2021-11-10 36999 0.004318129 digest
#> 3 2021-11-11 86956 0.011273038 rlang
#> 4 2021-11-11 36907 0.004784650 digest
#> 5 2021-11-12 78391 0.011641753 rlang
#> 6 2021-11-12 32555 0.004834704 digest
stats <- cran_download_fraction(pkgs, when = "last-week")
head(stats)
#> date count fraction package
#> 1 2021-11-17 87119 0.011624874 rlang
#> 2 2021-11-17 36247 0.004836681 digest
#> 3 2021-11-18 86853 0.012107869 rlang
#> 4 2021-11-18 37356 0.005207668 digest
#> 5 2021-11-19 72217 0.011277519 rlang
#> 6 2021-11-19 30428 0.004751684 digest
Add argument fraction = FALSE
to cran_downloads()
and make the above calculations internally.
Maybe fraction = TRUE
could even be the default?
Limitation: The above is only for download fraction per day. For anyone who wishes to calculate download fraction for a longer time period, say, per week or per month, will have to do something else.
Well, this isn't really returning quantiles (or at least, not what I had in mind). rlang
might represent 1.2% of all downloads on 2021-11-17, but I would assume that places it in the 99th percentile among all CRAN packages.
Doh! Fair point. I have no idea what I was thinking. I've updated my comment to say 'fraction' instead of 'quantile'.
Feature request: I'm not sure how much work would be involved in implementing this, but I think it would be very useful to have a function to return percentiles for downloads, in order to be able to say things like "package X is in the top 10% of downloaded packages from CRAN".