Open cderv opened 2 years ago
I've made a change to show only one URL if multiple are provided. This does not fully fulfill the original request, but perhaps the OP would be happy enough.
It may be a little tricky to fulfill the original request because we will have to tell which entry in citation(auto = TRUE)
is the package citation and not duplicate it with the entry generated from citation(auto = FALSE)
.
Oh I see the logic now. I previously missed it.
We don't get the same results as I expected because some part of the CITATION file processing is filtered out. https://github.com/yihui/knitr/blob/cab26efb9df00b9612d67f0d88b82a767b6177e8/R/citation.R#L109-L114
That is why the processing of keeping only the first url that we put in CITATION file is not kept. The content from DESCRIPTION takes precedence in the function.
The change you've made offer that but the url is still set in note =
and not in url =
.
Maybe that is not so important but I was curious of why we got this difference in the first place.
Now I understand that CRAN url will be used if multiple URL are used in the fields https://github.com/yihui/knitr/blob/cab26efb9df00b9612d67f0d88b82a767b6177e8/R/citation.R#L75-L80 so we'll always have the difference with CRAN package.
Maybe that is not worth changing.
Now I understand that CRAN url will be used if multiple URL are used in the fields
@yihui Why do we use here the CRAN url instead of the the first URL provided ?
If we keep this rule, we may need to also support RSPM as a repo to change the URL to the CRAN one.
Currently, we don't get the same information for the same package installed from CRAN or RSPM
> withr::with_temp_libpaths({
+ install.packages("trackdown", lib = .libPaths()[1], repos = "https://cran.rstudio.com")
+ knitr::write_bib("trackdown")
+ install.packages("trackdown", lib = .libPaths()[1], repos = "https://packagemanager.rstudio.com/all/__linux__/focal/latest")
+ knitr::write_bib("trackdown")
+ })
@Manual{R-trackdown,
title = {trackdown: Collaborative Writing and Editing of R Markdown (or Sweave)
Documents in Google Drive},
author = {Emily Kothe and Claudio {Zandonella Callegher} and Filippo Gambarota and Janosch Linkersdörfer and Mathew Ling},
year = {2021},
note = {R package version 1.0.0},
url = {https://CRAN.R-project.org/package=trackdown},
}
@Manual{R-trackdown,
title = {trackdown: Collaborative Writing and Editing of R Markdown (or Sweave)
Documents in Google Drive},
author = {Emily Kothe and Claudio {Zandonella Callegher} and Filippo Gambarota and Janosch Linkersdörfer and Mathew Ling},
year = {2021},
note = {https://github.com/claudiozandonella/trackdown/},
}
This is because:
citation("trackdown", auto = FALSE)
will not create a URL field. It will only do it for CRAN repo when multi URLcitation("trackdown", auto = TRUE)
will - but the whole citation will be ignored when merged with the previous one. (filtered out on the field isTRUE(grepl("R package version", cite$note)
I am not sure which is the best solution for this function to work for more packages but I believe we could just also support RSPM with CRAN to set the URL field to the CRAN page of the package.
diff --git a/R/citation.R b/R/citation.R
index e22f6b27..a002bb4c 100644
--- a/R/citation.R
+++ b/R/citation.R
@@ -73,7 +73,7 @@ write_bib = function(
cite = citation(pkg, auto = if (pkg != 'base') {
meta = packageDescription(pkg, lib.loc = lib.loc)
# don't use the CRAN URL if the package has provided its own URL
- if (identical(meta$Repository, 'CRAN') && !is.null(meta$URL)) {
+ if (meta$Repository %in% c('CRAN', 'RSPM') && !is.null(meta$URL)) {
# however, the package may have provided multiple URLs, in which case we
# still use the CRAN URL
if (!grepl('[, ]', meta$URL)) meta$Repository = NULL
Simple fix that would set an URL also when RSPM is used.
+ if (meta$Repository %in% c('CRAN', 'RSPM') && !is.null(meta$URL)) {
Sure. We can certainly do that.
Why do we use here the CRAN url instead of the the first URL provided ?
I don't remember exactly, but it's probably because it's not robust to split multiple URLs by commas---one URL can contain a comma. Perhaps splitting by ", "
(comma followed by space) is safe enough. We can do that, too.
It may also be because I wanted to support the "canonical" CRAN URL when there are multiple URLs, if I must pick one URL.
I've just had a discussion about the URL that citation()
provides. It started here and continued offline. There were two issues:
citation()
doesn't know which of them to use, so it was using a note
entry, and overwriting the package version information that was recorded there. This bug has been fixed in R-devel, but it still affects write_bib()
in earlier versions.citation()
uses canonical URLs from those sites rather than the URL provided by DESCRIPTION. I called that a bug, and after some back and forth I agreed that sometimes the DESCRIPTION URL is right, and sometimes the canonical one is right. Basically if you want the reader to find out more about the package, use the first one; if you want to know exactly what code was used in the paper, use the second. (Package version numbers aren't guaranteed to uniquely identify the package code in general, but on CRAN and BioC they do.)BTW, a reason not to use the first URL when two are provided is that the definition of what kind of thing should go in the DESCRIPTION file URL field is very loose. It is not at all required that the first one is about the package, it might be the author's home page.
I've got a patch to write_bib()
that works around the bug in citation()
so that version info isn't lost in the two-URL case. I'll put in a PR for that soon. Would you also be interested in adding an argument to write_bib()
to choose which kind of URL to use?
Thanks, Duncan! Just a small addition: The citation()
functionality will also be extended soon in R-devel to recognize packages installed via remotes::install_github()
and adapt the "note" and "url" fields correspondingly (indicating the exact commit that was installed). Kurt will incorporate a patch that he has discussed briefly with Gabor.
Would you also be interested in adding an argument to
write_bib()
to choose which kind of URL to use?
@dmurdoch Yes. Thanks!
@yihui : Ok, I'll include both changes in the PR. I'll wait until I see the citation()
changes for Github before submitting it, so they'll be covered too.
This is done now in #2264. I think this fixes other issues here too, though I'm not completely sure about the RSPM intention.
This would allow some special like keeping first URL if multiple to be apply.
For example, we have this in bookdown where we use a CITATION file to get some specific logic https://github.com/rstudio/bookdown/blob/0de8f113fd1f8f9d8d140b754c5120eca1e27a0a/inst/CITATION#L12
Currently,
write_bib()
overwrite theauto=
argument ofutils::citation()
by some content of the DESCRIPTION file. This lead to different output of the two functions.This came up in https://github.com/yihui/rmarkdown-cookbook/pull/348#issuecomment-876484275 Dealing with CITATION file if it exists would also work for dev package not yet on CRAN possibly.
Opening to track the idea and see if we have other use case for this.