Fixed: limit on number of packages + add `rworkflows`

bschilder commented 1 year ago

Fix cran_downloads and prevent it from failing with many packages: https://github.com/r-hub/cranlogs/issues/56
Speed up cran_downloads with parallelisation.
Add new unit test for cran_downloads with lots of packages.
Remove dontrun{} from cran_downloads and cran_top_downloads examples. Unclear why this was here?
New support function: message_parallel, split_batches
Replace travis (failing) and pkgdown with rworkflows.
Update Authors in DESCRIPTION to reflect contribution, make fields explicit.
Add node_modules$ to .Rbuildignore.

All you need to do on your end @gaborcsardi is add GH token as a GH secret named "PAT_GITHUB" to the cranlogs repo.

Best, Brian

bschilder commented 1 year ago

~~Heads up @gaborcsardi , cranlogs.org seems to be down today. So none of the functions will work until that is back up: https://cranlogs.r-pkg.org/~~

Site is back up now.

bschilder commented 1 year ago

@gaborcsardi have you (or any other maintainers of cranlogs) had a chance to look this over?

Thanks, Brian

gaborcsardi commented 1 year ago

Thanks for the PR, but I am not going to merge this, sorry.

I prefer using our workflows instead of rworkflows.

As for the parallelization, I prefer not to use parallel for HTTP.

bschilder commented 1 year ago

Thanks for the PR, but I am not going to merge this, sorry.

Ok, I suppose that's up to you, but would you mind providing some justifications for the benefit of users who requested some of these features? eg #56

I prefer using our workflows instead of rworkflows.

It seems your workflows has been failing for several years now, unless I'm missing something. If you were to fix this i think it would be a decent alternative.

As for the parallelization, I prefer not to use parallel for HTTP.

Is there a technical reason for this? I'd be curious to know myself! Similarly, is there an alternative implementation that would suit this purpose better?

If you do not wish to support parallelization at all, you have some other options available to you (in order of preference):

Simply replace parallel::mclapply with lapply. This will at least resolve the bug described here #56
Adding some documentation or warnings to notify users that cranlogs is currently unable to request info for more than ~X packages at a time.

Happy to adjust the PR as needed, just let me know what you'd like to accomplish with this package.

gaborcsardi commented 1 year ago

It seems your workflows has been failing for several years now, unless I'm missing something. If you were to fix this i think it would be a decent alternative.

I prefer using usethis::use_tidy_github_actions()

Is there a technical reason for this?

Yes, several. Using multiple processes for concurrent HTTP is an anti-pattern. mcapply() is also unsafe on macOS (search for "fork without exec on macOS"), and does not work on Windows. Here is a good implementation for parallel HTTP with curl: https://github.com/r-lib/rcmdcheck/blob/0cb1073db067d59f2659e600678758f25e00a512/R/http.R

However, in general, I am not sure if I want to provide a quick way for people to "rank" all CRAN packages according to downloads. I don't really want people to think that this is an indication of package quality.

bschilder commented 1 year ago

It seems your workflows has been failing for several years now, unless I'm missing something. If you were to fix this i think it would be a decent alternative.

I prefer using usethis::use_tidy_github_actions()

Haven't come across this one! I'll check it out though. As you may already be aware, there's also biocthis, just FYI.

Even so, if you're no longer using Travis, perhaps it's time to retire it from here?

Is there a technical reason for this?

Yes, several. Using multiple processes for concurrent HTTP is an anti-pattern. mcapply() is also unsafe on macOS (search for "fork without exec on macOS"), and does not work on Windows. Here is a good implementation for parallel HTTP with curl: https://github.com/r-lib/rcmdcheck/blob/0cb1073db067d59f2659e600678758f25e00a512/R/http.R

This is great information to have! Thanks for the explanation.

However, in general, I am not sure if I want to provide a quick way for people to "rank" all CRAN packages according to downloads. I don't really want people to think that this is an indication of package quality.

Hm, I hadn't thought of it that way. I personally don't agree with this reasoning as I think it's best to give the users the complete data and make decisions about how to responsibly use it themselves (perhaps with a disclaimer in the documentation). For example, in my case I'm using the downloads to assess package usage, but I wouldn't think to use this as a metric of quality (not sure how many other users would?).

Would you mind updating this thread here to let the other users know of your decision either way? Thanks! #56

r-hub / cranlogs

Fixed: limit on number of packages + add `rworkflows` #67