Open lindbrook opened 2 years ago
For what it's worth, I've patched this in packageRank::cranDownload() using fixCranlogs(). When any of the 8 days are queried, the function recomputes the counts using a stored copy of those 8 days' logs (an R list object named "rstudio.logs").
This is for posterity's sake but I hope it'll be fixed.
For eight days at end of 2012 and the beginning of 2013, cranlogs::cran_downloads() returns counts that are double or even triple of what they should be. I'm fairly confident of this conclusion because the numbers I get are derived by directly downloading the logs from RStudio and counting the number of log entries.
The code for my analysis:
The ratio of these differences are generally whole numbers. This leads me to believe that there may be computational errors in 'cranlogs'.
1) I'm not sure what's going on with "2012-10-06".
2) I believe that problem with "2012-10-07", "2012-10-08" and ""2012-10-11" stem from the fact that those logs for are actually duplicated in the RStudio logs.
This overcounting makes sense because, as you wrote in issue #54, you rely on the data in the files and not the filenames/URLs. By doing so, you may have ended up double counting.
3) I haven't sorted out what's going on with the 4 remaining dates ("2012-12-26", "2012-12-27", "2012-12-28", "2013-01-01") but I'm guessing it has something to do with the fact that they surround the 3 missing/lost RStudio logs ("2012-12-29", "2012-12-30", "2012-12-31").
Note that the ratios for the three December dates are not whole numbers. However, I did a sanity check using the top six packages for each of the three days; they all returned whole number multiples. If useful, I can provide more details.