r-lib / roxygen2

Generate R package documentation from inline R comments
https://roxygen2.r-lib.org
Other
594 stars 233 forks source link

Catch URL formatting issues in DESCRIPTION? #1420

Open cboettig opened 2 years ago

cboettig commented 2 years ago

Some packages have historically used annotations on the URL field, such as curl:

URL: https://github.com/jeroen/curl (devel) 
    https://curl.se/libcurl/ (upstream)
BugReports: https://github.com/jeroen/curl/issues

(https://github.com/jeroen/curl/blob/e7502d27c7cff17baa0930c3327f84ae4918aa24/DESCRIPTION#L22-L23)

This syntax is no longer acceptable on CRAN, resulting in warnings such as:

   Found the following (possibly) invalid URLs:
     URL: https://docs.ropensci.org/gbifdb/ (website)
https://github.com/ropensci/gbifdb
       From: man/gbifdb-package.Rd
       Status: Error
       Message: URL using bad/illegal format or missing URL
       URL contains spaces
     Spaces in an http[s] URL should probably be replaced by %20

but will not be caught by a standard R CMD check (on R <= 4.2.0) or by urlchecker. Would it be reasonable to add such a check to urlchecker? (The correct syntax apparently cannot have non-URLs in the comma-separated list I think).

gaborcsardi commented 2 years ago

I suspect that the issue is that roxygen2 does not parse the urls in DESCRIPTION correctly, and it creates invalid URLs in the package manual page. Do you have a package that reproduces this?

cboettig commented 2 years ago

@gaborcsardi sure, you can see this happen, e.g. in the curl package I linked above.

gaborcsardi commented 2 years ago

I doubt that this is happening to the curl package, though. Or do you have a way to reproduce it with curl?

The error you show is not coming from DESCRIPTION, but from man/gbifdb-package.Rd, and it is the result of roxygen2 writing bad links in that file, I think.

cboettig commented 2 years ago

Hi Gabor, my apologies for being unclear.

I think the way the URL specification is done in the curl package is invalid due to the annotations, just like the gbifdb example. However, as you say, the curl package does not throw the same warning on check_win_devel(). The warning shown from gbifdb is a consequence of an incorrectly formed \url tag being created in the DESCRIPTION by roxygen for default package documentation: https://github.com/ropensci/gbifdb/blob/212b34d6649f155bddec605e41d04f249ce3d4e8/man/gbifdb-package.Rd#L14

curl package-level documentation page is not generated in this way, and does not include the link. But this is not I think a bug in roxygen or desc, but just a consequence of an invalid URL string in the DESCRIPTION. For instance, you will note that inside RStudio, on the package list for the curl package, the URL that RStudio links is broken: https://github.com/jeroen/curl%20(devel)%20https://curl.se/libcurl/%0A(upstream) .

To me, this suggests that adding non-url annotations in the URL field is invalid, as suggested in the R exts manual:

The ‘URL’ field may give a list of URLs separated by commas or whitespace, for example the homepage of the author or a page where additional material describing the software can be found. These URLs are converted to active hyperlinks in CRAN package listings. See Specifying URLs.

and so I think it would be better if we had a way to catch these cases?

Thanks for all you do! Obviously you know these details way better than me, so perhaps the annotation idea is fine and this really should be a bug report against desc or whatever is parsing the URLs in the DESCRIPTION, so totally happy to defer to you.

gaborcsardi commented 2 years ago

Yes, this makes sense, but this is not something that we can easily fix in this package I am afraid. urlchecker uses base R's code to extract the URLs from the package, and that code can actually handle these URL fields well. Which also suggests that they are OK (for R-core and CRAN at least).

So I think a we'd better fix this in roxygen2, and RStudio.

jennybc commented 2 years ago

I came through this repo for other reasons, but this caught my eye. I've also wrestled with these non-URL annotations in the URL field, because usethis parses that field for various purposes.

There's a lot of info here: https://github.com/r-lib/desc/issues/56

I think desc already knows how to cope (see above) and therefore this is a roxygen2 problem, as @gaborcsardi says, when it builds package-level help using info found in DESCRIPTION.

gaborcsardi commented 2 years ago

In practice, the curl URLs are handled correctly by the CRAN web pages: https://cran.rstudio.com/web/packages/curl/, the desc package, etc. and they have been around for a while, so I don't think we should ban them at this point.

I'll fix it in roxygen2, it already uses desc, so it should be pretty easy.

cboettig commented 2 years ago

Thanks @gaborcsardi & @jennybc ! Does this need to be reported separately for the RStudio (server) IDE? It is still creating broken links in the package window there:

image

gaborcsardi commented 2 years ago

@cboettig Yes, please report it at rstudio/rstudio. Thanks!

hadley commented 1 year ago

Sounds like this was resolved via a fix to desc?

jennybc commented 1 year ago

No, I suspect this still needs a fix in roxygen2, but the smarts already in desc will make it less painful to fix.