r-hub / cranlogs

Download Logs from the RStudio CRAN Mirror
https://r-hub.github.io/cranlogs/
Other
80 stars 13 forks source link

limit on number of packages as argument to cran_downloads #56

Open adfi opened 4 years ago

adfi commented 4 years ago

Hi,

I tried to do get download counts for 8000 packages and ran into a HTTP 414 (Request-URI Too Long). After some trial and error it seems the limit is at 905 packages, reproducable with following code:

cran_downloads(package = rep('cranlogs', 906))

I can split up the requests but it would be nicer to have that done by the package. Also the limit is not documented. Let me know if I'm doing something the package wasn't intended for.

gaborcsardi commented 4 years ago

Well, that's the URL length limit I guess, because the package names are sent in the URL. We could have a POST API, and then there is no limit.

adfi commented 4 years ago

So where does the change need to happen? In cranlogs.app?

gaborcsardi commented 4 years ago

Everywhere. Frankly, it is simpler to return all packages, if you want 8000, then you might as well get all of them. :)

bschilder commented 1 year ago

@gaborcsardi This could be done within cranlogs by submitting the list of packages in batches, right?

bschilder commented 1 year ago

Would need to know what the max batch size can be (ie at what point does the URI get too long, on average):

batch_size =1000
v <- rownames(utils::available.packages())
batches <- split(v, ceiling(seq_along(v)/batch_size))
     cran <- lapply(seq_len(length(batches)),
                           function(i){
                               b <- batches[[i]]
                               message(paste("Batch:",i,"/",length(batches)))
                               dt <-  cran <- cranlogs::cran_downloads(
                                   packages = b, 
                                   from = "1990-01-01", 
                                   to = Sys.Date()-1)   
                               return(dt)
                           }) |> 
            data.table::rbindlist(fill=TRUE) 

Should be an easy fix. Happy to make a PR.

bschilder commented 1 year ago

@adfi , I agree this should be handled internally by the package or at least documented to note the limitation.

bschilder commented 1 year ago

Done here @adfi : https://github.com/r-hub/cranlogs/pull/67