r-lib / urlchecker

Run CRAN URL checks from older versions of R
https://urlchecker.r-lib.org/
GNU General Public License v3.0
45 stars 5 forks source link

Add sanity check for page anchors #22

Open AshesITR opened 2 years ago

AshesITR commented 2 years ago

It would be nice if not only the URLs are checked, but also existence of page anchors in the returned page if those are present.

e.g. for https://style.tidyverse.org/functions.html#long-lines-1

Check that

  1. The HTTP Status of https://style.tidyverse.org/functions.html is 200
  2. The HTML of https://style.tidyverse.org/functions.html#long-lines-1 has a match for the XPath //*[@id = 'long-lines-1']
gaborcsardi commented 2 years ago

AFAIR, we are not actually downloading the target URLs, unless the HEAD request we perform fails.

Another thing is that we try to follow what base R does, we actually use the same code. So maybe you could suggest this feature for base R, to see if they want to have it.