Open lindbrook opened 4 years ago
Maybe related to this: the download counts for 16th and 17th of Jan. are 0 as well.
cranlogs::cran_downloads(
packages = "ggplot2",
from = "2020-01-12",
to = Sys.Date()
)
#> date count package
#> 1 2020-01-12 23692 ggplot2
#> 2 2020-01-13 41793 ggplot2
#> 3 2020-01-14 42412 ggplot2
#> 4 2020-01-15 40575 ggplot2
#> 5 2020-01-16 0 ggplot2
#> 6 2020-01-17 0 ggplot2
#> 7 2020-01-18 19643 ggplot2
#> 8 2020-01-19 0 ggplot2
Created on 2020-01-19 by the reprex package (v0.3.0.9001)
I have fixed most of these, except for the ones in 2012, for which my parser fails, so I'll need to take a closer look to these....
These three days are really missing, because the 2012-12-29 file contains the data for 2012-12-26, etc., but then from 2013-01-01 the files names actually refer to the correct day. So these three days are lost forever. IDK if we should document this somewhere or do something else about it.
Use a warning() to flag those dates in cranlogs::cran_downloads()?
Yeah, possibly.
Logs for 2012, which start on Oct 1, need some TLC. They are fixable but the last three days of 2012 do indeed seem to be lost.
1) Logs "2012-10-01" between "2012-10-10" are OK.
2) Logs between "2012-10-16" and "2012-12-31" are offset by -3 days. If you look at the log for "2012-10-16" you get "2012-10-13"; If you look at the log for "2012-12-31" you get "2012-12-28".
3) Logs from "2012-10-11" though "2012-10-15" have three duplicates, Oct 7, Oct 8 and Oct 11. This probably requires some juggling.
Nominal Actual
11 ----- 07
12 ----- 11
13 ----- 08
14 ----- 12
15 ----- 11
Details later if you want.
FWIW, with 'packageRank' 0.3.0.9026, you can check these with:
unique(packageRank::packageLog(date = "2012-10-11")$date)
Thanks! I don't think there is much to fix, I don't actually use the filenames when updating the db, only the data in the files.
That's interesting (is that part of the code on GitHub?). I actually make use of the filenames. So I can "fix" it on my end. But do you think this would be something worth informing RStudio about?
Yeah, it is here: https://github.com/r-hub/cranlogs.app/blob/master/db/update.sh
These logs are gone, I am pretty sure, so there is nothing anyone can do about these three days.
I meant updating the filenames so they point to the correct log file.
Ah, I see. I am not sure if it is worth changing it. People might have their own workarounds already, and then we'll break them.
Then, it's probably worth just noting the missing days in the README/webpage.
I am just posting here it because I wonder if the low download count has anything to do with the date being 29th Feb!
cranlogs::cran_downloads(
packages = "ggplot2",
from = "2020-02-25",
to = Sys.Date()
)
#> date count package
#> 1 2020-02-25 42860 ggplot2
#> 2 2020-02-26 44631 ggplot2
#> 3 2020-02-27 42154 ggplot2
#> 4 2020-02-28 34426 ggplot2
#> 5 2020-02-29 5554 ggplot2
#> 6 2020-03-01 0 ggplot2
My guess is that part of the reason is that scripts used to do automated downloads may not have accounted for the leap day. That said, the last available leap day, in 2016, wasn't particularly unusual:
plot(cranlogs::cran_downloads(from = "2016-02-01", to = "2016-02-29"), type = "o")
plot(packageRank::cranDownloads(from = "2016-02", to = "2016-02"))
Also source for R v3.6.3 was released on 2020-02-29.
FWIW, also affected downloads of R:
Wow. This is probably an oversimplification, but maybe there weren't that many automated downloads back in 2016.
IDK if the release date has anything to do with it, but that's easy to check, the other release dates are these: https://rversions.r-pkg.org/r-versions
"2016-02-29" was a monday
The 2020-03-26 was also with 0 downloads.
cranlogs::cran_downloads(
packages = "ggplot2",
from = "2020-03-25",
to = Sys.Date()
)
#> date count package
#> 1 2020-03-25 63129 ggplot2
#> 2 2020-03-26 0 ggplot2
#> 3 2020-03-27 63344 ggplot2
Download counts are also 0 for 2nd and 3rd of April.
cranlogs::cran_downloads(
packages = "ggplot2",
from = "2020-03-31",
to = Sys.Date()
)
#> date count package
#> 1 2020-03-31 66205 ggplot2
#> 2 2020-04-01 65428 ggplot2
#> 3 2020-04-02 0 ggplot2
#> 4 2020-04-03 0 ggplot2
#> 5 2020-04-04 50522 ggplot2
2020-04-20 also had 0 downloads.
cranlogs::cran_downloads(
packages = "ggplot2",
from = "2020-04-18",
to = Sys.Date()
)
#> date count package
#> 1 2020-04-18 52350 ggplot2
#> 2 2020-04-19 48923 ggplot2
#> 3 2020-04-20 0 ggplot2
#> 4 2020-04-21 63808 ggplot2
#> 5 2020-04-22 0 ggplot2
Created on 2020-04-22 by the reprex package (v0.3.0.9001)
Haven't seen any downloads for the last week.
r$> cran_downloads(from="2020-06-01", package="dplyr")
date count package
1 2020-06-01 50366 dplyr
2 2020-06-02 52765 dplyr
3 2020-06-03 52948 dplyr
4 2020-06-04 50348 dplyr
5 2020-06-05 47053 dplyr
6 2020-06-06 31556 dplyr
7 2020-06-07 32620 dplyr
8 2020-06-08 51816 dplyr
9 2020-06-09 51841 dplyr
10 2020-06-10 49710 dplyr
11 2020-06-11 48361 dplyr
12 2020-06-12 44394 dplyr
13 2020-06-13 29262 dplyr
14 2020-06-14 29947 dplyr
15 2020-06-15 48074 dplyr
16 2020-06-16 47806 dplyr
17 2020-06-17 45596 dplyr
18 2020-06-18 43152 dplyr
19 2020-06-19 37575 dplyr
20 2020-06-20 0 dplyr
21 2020-06-21 0 dplyr
22 2020-06-22 0 dplyr
23 2020-06-23 0 dplyr
24 2020-06-24 0 dplyr
25 2020-06-25 0 dplyr
Seems to have updated; when I rerun the above command, I get more days filled in. Still missing the most recent 2 days though.
r$> cran_downloads(from="2020-06-01", package="dplyr")
date count package
. . .
21 2020-06-21 23537 dplyr
22 2020-06-22 41854 dplyr
23 2020-06-23 44296 dplyr
24 2020-06-24 0 dplyr
25 2020-06-25 0 dplyr
The log for the current day (e.g. 2020-06-25) isn't be available till the next day (e.g. 2020-06-26).
Regarding the 24th, I think they're moving servers/services so my understanding is that they've been manually running the script of late (time zones may come into play as well).
FWIW, if you really want the latest counts, you can fetch the logs directly (http://cran-logs.rstudio.com/) or use packages/functions that do so.
Logs are getting hung up again:
. . .
21 2020-06-21 23537 dplyr
22 2020-06-22 41854 dplyr
23 2020-06-23 44296 dplyr
24 2020-06-24 42407 dplyr
25 2020-06-25 42091 dplyr
26 2020-06-26 37934 dplyr
27 2020-06-27 0 dplyr
28 2020-06-28 0 dplyr
29 2020-06-29 0 dplyr
30 2020-06-30 0 dplyr
It's weird that the service is so patchy. I'd have thought it's just a daily cron job or something, so that updates "just work".
Logs seem to be hung up again:
date count package
1 2020-08-12 48458 ggplot2
2 2020-08-13 49645 ggplot2
3 2020-08-14 44313 ggplot2
4 2020-08-15 35502 ggplot2
5 2020-08-16 39237 ggplot2
6 2020-08-17 0 ggplot2
7 2020-08-18 0 ggplot2
8 2020-08-19 0 ggplot2
9 2020-08-20 0 ggplot2
No download count for 2020-10-03
:
cranlogs::cran_downloads(
packages = "ggplot2",
from = "2020-09-26",
to = Sys.Date()
)
#> date count package
#> 1 2020-09-26 43607 ggplot2
#> 2 2020-09-27 45068 ggplot2
#> 3 2020-09-28 60917 ggplot2
#> 4 2020-09-29 63517 ggplot2
#> 5 2020-09-30 64071 ggplot2
#> 6 2020-10-01 60625 ggplot2
#> 7 2020-10-02 56791 ggplot2
#> 8 2020-10-03 0 ggplot2
#> 9 2020-10-04 43545 ggplot2
Created on 2020-10-06 by the reprex package (v0.3.0.9001)
There are five days in 2020 that cranlogs::cran_downloads() still reports as having zero downloads:
days <- c("2020-03-26", "2020-04-02", "2020-04-03", "2020-04-20", "2020-10-03")
out <- lapply(days, function(x) cranlogs::cran_downloads(from = x, to = x))
do.call(rbind, out)
# date count
# 1 2020-03-26 0
# 2 2020-04-02 0
# 3 2020-04-03 0
# 4 2020-04-20 0
# 5 2020-10-03 0
Would it be possible to fix these?
The count is 0 also for 2021-11-20
.
cranlogs::cran_downloads(
packages = "ggplot2",
from = "2021-11-18",
to = "2021-11-22"
)
#> date count package
#> 1 2021-11-18 115004 ggplot2
#> 2 2021-11-19 106105 ggplot2
#> 3 2021-11-20 0 ggplot2
#> 4 2021-11-21 86233 ggplot2
#> 5 2021-11-22 110980 ggplot2
Created on 2021-11-27 by the reprex package (v2.0.1)
FWIW, the RStudio logs were posted "late" that day. When that happens, 'cranlogs' will return a zero count.
There are 43 days when cranlogs::cran_downloads() reports that there were zero package downloads. I've checked a couple of logs at http://cran-logs.rstudio.com/; they seem to disagree.
I'm guessing these will be fixed when you update the DB script (#45).
FWIW