r-hub / cranlogs

Download Logs from the RStudio CRAN Mirror
https://r-hub.github.io/cranlogs/
Other
80 stars 13 forks source link

Days with no CRAN downloads #54

Open lindbrook opened 4 years ago

lindbrook commented 4 years ago

There are 43 days when cranlogs::cran_downloads() reports that there were zero package downloads. I've checked a couple of logs at http://cran-logs.rstudio.com/; they seem to disagree.

dates <- as.Date(c("2018-01-05", "2018-02-09", "2018-02-10", "2018-02-23",
  "2018-02-24", "2018-05-06", "2018-05-12", "2018-05-19", "2018-05-27",
  "2018-07-07", "2018-07-08", "2018-07-28", "2018-08-31", "2018-10-21",
  "2017-01-12", "2017-07-16", "2017-09-01", "2017-09-02", "2016-02-03",
  "2016-06-02", "2016-06-12", "2016-07-12", "2016-07-24", "2016-08-04",
  "2016-08-11", "2016-08-13", "2016-08-14" ,"2016-08-20", "2016-09-02",
  "2016-09-09", "2015-08-23", "2015-09-07", "2015-09-09", "2015-10-18",
  "2015-10-26", "2015-10-31", "2015-11-01", "2015-11-15", "2014-01-01",
  "2014-11-17", "2012-12-29", "2012-12-30", "2012-12-31"))

dates <- sort(dates)

zero_downloads <- lapply(dates, function(x) {
  cranlogs::cran_downloads(from = x, to = x)
})

zero_downloads <- do.call(rbind, zero_downloads)

I'm guessing these will be fixed when you update the DB script (#45).

FWIW

IndrajeetPatil commented 4 years ago

Maybe related to this: the download counts for 16th and 17th of Jan. are 0 as well.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2020-01-12",
  to = Sys.Date()
)
#>         date count package
#> 1 2020-01-12 23692 ggplot2
#> 2 2020-01-13 41793 ggplot2
#> 3 2020-01-14 42412 ggplot2
#> 4 2020-01-15 40575 ggplot2
#> 5 2020-01-16     0 ggplot2
#> 6 2020-01-17     0 ggplot2
#> 7 2020-01-18 19643 ggplot2
#> 8 2020-01-19     0 ggplot2

Created on 2020-01-19 by the reprex package (v0.3.0.9001)

gaborcsardi commented 4 years ago

I have fixed most of these, except for the ones in 2012, for which my parser fails, so I'll need to take a closer look to these....

gaborcsardi commented 4 years ago

These three days are really missing, because the 2012-12-29 file contains the data for 2012-12-26, etc., but then from 2013-01-01 the files names actually refer to the correct day. So these three days are lost forever. IDK if we should document this somewhere or do something else about it.

lindbrook commented 4 years ago

Use a warning() to flag those dates in cranlogs::cran_downloads()?

gaborcsardi commented 4 years ago

Yeah, possibly.

lindbrook commented 4 years ago

Logs for 2012, which start on Oct 1, need some TLC. They are fixable but the last three days of 2012 do indeed seem to be lost.

1) Logs "2012-10-01" between "2012-10-10" are OK.

2) Logs between "2012-10-16" and "2012-12-31" are offset by -3 days. If you look at the log for "2012-10-16" you get "2012-10-13"; If you look at the log for "2012-12-31" you get "2012-12-28".

3) Logs from "2012-10-11" though "2012-10-15" have three duplicates, Oct 7, Oct 8 and Oct 11. This probably requires some juggling.

Nominal Actual 11 ----- 07 12 ----- 11 13 ----- 08 14 ----- 12 15 ----- 11

Details later if you want.

FWIW, with 'packageRank' 0.3.0.9026, you can check these with:

unique(packageRank::packageLog(date = "2012-10-11")$date)

gaborcsardi commented 4 years ago

Thanks! I don't think there is much to fix, I don't actually use the filenames when updating the db, only the data in the files.

lindbrook commented 4 years ago

That's interesting (is that part of the code on GitHub?). I actually make use of the filenames. So I can "fix" it on my end. But do you think this would be something worth informing RStudio about?

gaborcsardi commented 4 years ago

Yeah, it is here: https://github.com/r-hub/cranlogs.app/blob/master/db/update.sh

These logs are gone, I am pretty sure, so there is nothing anyone can do about these three days.

lindbrook commented 4 years ago

I meant updating the filenames so they point to the correct log file.

gaborcsardi commented 4 years ago

Ah, I see. I am not sure if it is worth changing it. People might have their own workarounds already, and then we'll break them.

lindbrook commented 4 years ago

Then, it's probably worth just noting the missing days in the README/webpage.

IndrajeetPatil commented 4 years ago

I am just posting here it because I wonder if the low download count has anything to do with the date being 29th Feb!

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2020-02-25",
  to = Sys.Date()
)

#>         date count package
#> 1 2020-02-25 42860 ggplot2
#> 2 2020-02-26 44631 ggplot2
#> 3 2020-02-27 42154 ggplot2
#> 4 2020-02-28 34426 ggplot2
#> 5 2020-02-29  5554 ggplot2
#> 6 2020-03-01     0 ggplot2
lindbrook commented 4 years ago

My guess is that part of the reason is that scripts used to do automated downloads may not have accounted for the leap day. That said, the last available leap day, in 2016, wasn't particularly unusual:

plot(cranlogs::cran_downloads(from = "2016-02-01", to = "2016-02-29"), type = "o") plot(packageRank::cranDownloads(from = "2016-02", to = "2016-02"))

lindbrook commented 4 years ago

Also source for R v3.6.3 was released on 2020-02-29.

lindbrook commented 4 years ago

FWIW, also affected downloads of R: r_downloads

gaborcsardi commented 4 years ago

Wow. This is probably an oversimplification, but maybe there weren't that many automated downloads back in 2016.

IDK if the release date has anything to do with it, but that's easy to check, the other release dates are these: https://rversions.r-pkg.org/r-versions

lindbrook commented 4 years ago

"2016-02-29" was a monday pkgs2016 r2016

IndrajeetPatil commented 4 years ago

The 2020-03-26 was also with 0 downloads.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2020-03-25",
  to = Sys.Date()
)
#>         date count package
#> 1 2020-03-25 63129 ggplot2
#> 2 2020-03-26     0 ggplot2
#> 3 2020-03-27 63344 ggplot2
IndrajeetPatil commented 4 years ago

Download counts are also 0 for 2nd and 3rd of April.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2020-03-31",
  to = Sys.Date()
)
#>         date count package
#> 1 2020-03-31 66205 ggplot2
#> 2 2020-04-01 65428 ggplot2
#> 3 2020-04-02     0 ggplot2
#> 4 2020-04-03     0 ggplot2
#> 5 2020-04-04 50522 ggplot2
IndrajeetPatil commented 4 years ago

2020-04-20 also had 0 downloads.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2020-04-18",
  to = Sys.Date()
)
#>         date count package
#> 1 2020-04-18 52350 ggplot2
#> 2 2020-04-19 48923 ggplot2
#> 3 2020-04-20     0 ggplot2
#> 4 2020-04-21 63808 ggplot2
#> 5 2020-04-22     0 ggplot2

Created on 2020-04-22 by the reprex package (v0.3.0.9001)

hongooi73 commented 4 years ago

Haven't seen any downloads for the last week.

r$> cran_downloads(from="2020-06-01", package="dplyr")
         date count package
1  2020-06-01 50366   dplyr
2  2020-06-02 52765   dplyr
3  2020-06-03 52948   dplyr
4  2020-06-04 50348   dplyr
5  2020-06-05 47053   dplyr
6  2020-06-06 31556   dplyr
7  2020-06-07 32620   dplyr
8  2020-06-08 51816   dplyr
9  2020-06-09 51841   dplyr
10 2020-06-10 49710   dplyr
11 2020-06-11 48361   dplyr
12 2020-06-12 44394   dplyr
13 2020-06-13 29262   dplyr
14 2020-06-14 29947   dplyr
15 2020-06-15 48074   dplyr
16 2020-06-16 47806   dplyr
17 2020-06-17 45596   dplyr
18 2020-06-18 43152   dplyr
19 2020-06-19 37575   dplyr
20 2020-06-20     0   dplyr
21 2020-06-21     0   dplyr
22 2020-06-22     0   dplyr
23 2020-06-23     0   dplyr
24 2020-06-24     0   dplyr
25 2020-06-25     0   dplyr
hongooi73 commented 4 years ago

Seems to have updated; when I rerun the above command, I get more days filled in. Still missing the most recent 2 days though.

r$> cran_downloads(from="2020-06-01", package="dplyr")
         date count package
. . .
21 2020-06-21 23537   dplyr
22 2020-06-22 41854   dplyr
23 2020-06-23 44296   dplyr
24 2020-06-24     0   dplyr
25 2020-06-25     0   dplyr
lindbrook commented 4 years ago

The log for the current day (e.g. 2020-06-25) isn't be available till the next day (e.g. 2020-06-26).

Regarding the 24th, I think they're moving servers/services so my understanding is that they've been manually running the script of late (time zones may come into play as well).

FWIW, if you really want the latest counts, you can fetch the logs directly (http://cran-logs.rstudio.com/) or use packages/functions that do so.

hongooi73 commented 4 years ago

Logs are getting hung up again:

. . .
21 2020-06-21 23537   dplyr
22 2020-06-22 41854   dplyr
23 2020-06-23 44296   dplyr
24 2020-06-24 42407   dplyr
25 2020-06-25 42091   dplyr
26 2020-06-26 37934   dplyr
27 2020-06-27     0   dplyr
28 2020-06-28     0   dplyr
29 2020-06-29     0   dplyr
30 2020-06-30     0   dplyr

It's weird that the service is so patchy. I'd have thought it's just a daily cron job or something, so that updates "just work".

nbarrowman commented 4 years ago

Logs seem to be hung up again:

        date count package
1 2020-08-12 48458 ggplot2
2 2020-08-13 49645 ggplot2
3 2020-08-14 44313 ggplot2
4 2020-08-15 35502 ggplot2
5 2020-08-16 39237 ggplot2
6 2020-08-17     0 ggplot2
7 2020-08-18     0 ggplot2
8 2020-08-19     0 ggplot2
9 2020-08-20     0 ggplot2
IndrajeetPatil commented 4 years ago

No download count for 2020-10-03:

    cranlogs::cran_downloads(
      packages = "ggplot2",
      from = "2020-09-26",
      to = Sys.Date()
    )
    #>          date count package
    #> 1  2020-09-26 43607 ggplot2
    #> 2  2020-09-27 45068 ggplot2
    #> 3  2020-09-28 60917 ggplot2
    #> 4  2020-09-29 63517 ggplot2
    #> 5  2020-09-30 64071 ggplot2
    #> 6  2020-10-01 60625 ggplot2
    #> 7  2020-10-02 56791 ggplot2
    #> 8  2020-10-03     0 ggplot2
    #> 9  2020-10-04 43545 ggplot2

Created on 2020-10-06 by the reprex package (v0.3.0.9001)

lindbrook commented 3 years ago

There are five days in 2020 that cranlogs::cran_downloads() still reports as having zero downloads:

days <- c("2020-03-26", "2020-04-02", "2020-04-03", "2020-04-20", "2020-10-03")
out <- lapply(days, function(x) cranlogs::cran_downloads(from = x, to = x))
do.call(rbind, out)

#         date count
# 1 2020-03-26     0
# 2 2020-04-02     0
# 3 2020-04-03     0
# 4 2020-04-20     0
# 5 2020-10-03     0

Would it be possible to fix these?

IndrajeetPatil commented 2 years ago

The count is 0 also for 2021-11-20.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2021-11-18",
  to = "2021-11-22"
)
#>         date  count package
#> 1 2021-11-18 115004 ggplot2
#> 2 2021-11-19 106105 ggplot2
#> 3 2021-11-20      0 ggplot2
#> 4 2021-11-21  86233 ggplot2
#> 5 2021-11-22 110980 ggplot2

Created on 2021-11-27 by the reprex package (v2.0.1)

lindbrook commented 2 years ago

FWIW, the RStudio logs were posted "late" that day. When that happens, 'cranlogs' will return a zero count.