r-hub / cranlogs

Download Logs from the RStudio CRAN Mirror
79 stars 13 forks source link

cranlogs::cran_downloads("R") (mostly) double counts from 2023-09-13 through 2023-10-02 #69

Open lindbrook opened 9 months ago

lindbrook commented 9 months ago

Following up on #68, I think there's also some double counting with cranlogs::cran_downloads(packages = "R"). The dates do not exactly match those for package/CRAN download counts and there are two exceptions: 1) 2023-09-28 where the counts are essentially identical except for a single difference possibly owing an os = NA and 2) 2023-09-30 where the counts for cranlogs::cran_downloads() are three times greater.

Here are the relevant ratios:

2023-09-12 2023-09-13 2023-09-14 2023-09-15 2023-09-16 2023-09-17 2023-09-18 2023-09-19 osx 1 2 2 2 2 2 2 2 src 1 2 2 2 2 2 2 2 win 1 2 2 2 2 2 2 2 2023-09-20 2023-09-21 2023-09-22 2023-09-23 2023-09-24 2023-09-25 2023-09-26 2023-09-27 osx 2 2 2 2 2 2 2 2 src 2 2 2 2 2 2 2 2 win 2 2 2 2 2 2 2 2 2023-09-28 2023-09-29 2023-09-30 2023-10-01 2023-10-02 2023-10-03 osx 1.000000 2 3 2 2 1 src 1.000801 2 3 2 2 1 win 1.000000 2 3 2 2 1

Here's the code I used:

dates <- seq.Date(as.Date("2023-09-12"), as.Date("2023-10-03"), by = "days")

rhub.data <- cranlogs::cran_downloads("R", from = min(dates), to = max(dates)) rhub <- t(tapply(rhub.data$count, list(rhub.data$date, rhub.data$os), sum)) rhub <- rhub[-1, ]

posit.data <- lapply(dates, packageRank::fetchRLog) names(posit.data) <- dates posit <- lapply(posit.data, function(x) tapply(x$date, x$os, length)) posit <- do.call(cbind, posit) colnames(posit) <- as.character(dates)

# for the ratios rhub / posit