r-lib / downlit

Syntax Highlighting and Automatic Linking
https://downlit.r-lib.org
Other
90 stars 22 forks source link

Check 'repos' repositories and CRAN for URL if package isn't installed #108

Closed ARawles closed 3 years ago

ARawles commented 3 years ago

This PR fixes #106.

Previously, packages needed to be installed for downlit to find the URL field. With this PR, downlit will go through 3 stages to try and find the URL, only proceeding to the next one if the previous one was unsuccessful:

  1. Check for a local install and DESCRIPTION file
  2. Check the user's 'repos' option (excluding any entries like 'cran') and check the PACKAGES file. This does mean that the PACKAGES file will need to include the URL field, which is not included by default, for this stage to be successful.
  3. Check CRAN, using the tools::CRAN_package_db() function, which includes more meta-data than the utils::available.packages() function.

A test case has been written to check that URLS for packages that are not installed are found correctly, however it currently relies on the BMRSr package not being installed. Although it's unlikely that this package would ever be installed when these tests are being run as part of a CI process, it's not perfect.

Still left to consider is the possibility of allowing the user to register a custom function that could be used to find a PACKAGE url (see Issue for discussion).

ARawles commented 3 years ago

Because of the use of CRAN_package_db() function from the tools package, this requires the minimum R version to be changed to 3.4 (instead of 3.2).

ARawles commented 3 years ago

One thing I wanted to get your thoughts on @hadley is the time penalty associated with checking CRAN every time downlit can't find the local package - here's a comparison of the current speed compared to the PR version:

downlit 0.2.1.9000

> microbenchmark::microbenchmark(
+     # local package
+     package_urls("dplyr"),
+     # non-installed package
+     package_urls("BMRSr"), times = 10)
Unit: microseconds
                  expr    min       lq      mean   median       uq      max neval
 package_urls("dplyr") 2792.2 2809.402 2878.3715 2834.352 2958.201 3065.801    10
 package_urls("BMRSr")  199.7  203.102  250.0211  240.251  259.201  404.601    10

downlit PR

> microbenchmark::microbenchmark(
+     # local package
+ package_urls("dplyr"),
+     # custom-repo package
+ package_urls("customrepopackage"),
+     # non-installed package
+ package_urls("BMRSr"),
+ times = 10)
Unit: milliseconds
                  expr         min        lq       mean    median          uq       max neval
 package_urls("dplyr")    2.876101    3.2662    5.65371    3.9897    4.470101   22.3326    10
 package_urls("customrepopackage") 4056.489800 4153.0438 4224.55182 4199.6713 4292.924902 4434.3823    10
 package_urls("BMRSr") 5720.558001 5786.0249 5911.00847 5876.3085 5950.955101 6240.9524    10

We're looking at about 4 seconds for packages from a custom repo and about 6 seconds for package from CRAN. I don't imagine that there will actually be that many packages that will get this far (i.e. I think most of them will be installed), but I think it's still a consideration. What do you think?

ARawles commented 3 years ago

Following the introduction of the memoise package, the benchmarks are looking a lot more healthy:

> results <- microbenchmark::microbenchmark(
+     package_urls("mycustomrepo"), times = 5
+ )
> results
Unit: microseconds
                         expr   min    lq   mean median      uq     max neval
 package_urls("mycustomrepo") 381.1 440.6 661506  629.2 11362.3 3294717     5
> results$time
[1] 3294716900   11362300     629200     440600     381100

The first run is taking us about 3 seconds but then everything after that is basically instantaneous.

ARawles commented 3 years ago

I've also had a go at adding the ability to register a custom function that will be used when no URL can be found for the package (on CRAN or otherwise).

It's a first attempt to any comments/improvements are welcome 👍

hadley commented 3 years ago

I've reverted the custom function registration out because I think that belongs in a separate PR, and I can merge your earlier work without more discussion.

hadley commented 3 years ago

@ARawles I've tweaked your code quite a bit, but I figured out how to add a test with a custom repo, so I'm fairly confident that I haven't broken anything. If you want to have a go at the custom function in another PR, you should be able to cherry-pick the commits of of this branch; LMK if you need any hints on how to do that.