r-hub / cranlogs

Download Logs from the RStudio CRAN Mirror
https://r-hub.github.io/cranlogs/
Other
80 stars 13 forks source link

R downloads before 2015 #41

Open lindbrook opened 5 years ago

lindbrook commented 5 years ago

I was trying to download the entire history of R downloads (‘cranlogs’ version 2.1.1 and 2.1.1.9000, R version 3.6.1, macOS 10.14.6). I noticed that there appears to be a problem for years before 2015.

I think there are two types of problems: 1) valid logs not being "read" and 2) invalid logs (no observations)

valid logs

a) The code below computes the number of R downloads from 1 January 2015 to yesterday, and from 31 December 2014 to yesterday

'# logs from 1 January 2015 to yesterday test1 <- cranlogs::cran_downloads("R", from = "2015-01-01", to = Sys.Date() - 1)

'# logs from 31 December 2014 to yesterday test2 <- cranlogs::cran_downloads("R", from = "2014-12-31", to = Sys.Date() - 1)

As you can see, both have the same number of observations and the data for 31 December 2014 appears to be missing:

'# same number of observations (rows) identical(nrow(test1), nrow(test2))

'# no data for 31 December 2014 head(test2[order(test2$date), ]) tail(test2[order(test2$date), ])

If you manually download the logs, you'll see that the log for 25 December 2014 looks OK and its format (str) appears to be the same as 1 January 2015.

http://cran-logs.rstudio.com/2014/2014-12-31-r.csv.gz http://cran-logs.rstudio.com/2015/2015-01-01-r.csv.gz

b) The code below computes the number of R downloads for the last week of 2014:

test3 <- cranlogs::cran_downloads("R", from = "2014-12-25", to = "2014-12-31")

Again, if you manually download the individual logs, they look fine:

http://cran-logs.rstudio.com/2014/2014-12-31-r.csv.gz http://cran-logs.rstudio.com/2014/2014-12-30-r.csv.gz http://cran-logs.rstudio.com/2014/2014-12-29-r.csv.gz http://cran-logs.rstudio.com/2014/2014-12-28-r.csv.gz http://cran-logs.rstudio.com/2014/2014-12-27-r.csv.gz http://cran-logs.rstudio.com/2014/2014-12-26-r.csv.gz http://cran-logs.rstudio.com/2014/2014-12-25-r.csv.gz

c) For what it's worth, here's the code for the calendar years 2012, 2013 and 2014:

test4 <- cranlogs::cran_downloads("R", from = "2014-01-01", to = "2014-12-31") test5 <- cranlogs::cran_downloads("R", from = "2013-01-01", to = "2013-12-31")

'# Note first log for 2012 begins on 01 October 2012 test6 <- cranlogs::cran_downloads("R", from = "2012-10-01", to = "2012-12-31")

'# number of rows is 0 for each year: vapply(list(test4, test5, test6), nrow, integer(1L))

invalid logs

A look at the first two logs, for 01 and 02 October 2012 shows that there are also logs that appear to have no entries (rows). Not sure how widespread this is but 2012 seems particularly problematic.

http://cran-logs.rstudio.com/2012/2012-10-01-r.csv.gz http://cran-logs.rstudio.com/2012/2012-10-02-r.csv.gz