ourresearch / citeas-api

Get the scholarly citation for any research product: software, preprint, paper, or dataset
https://citeas.org
MIT License
69 stars 4 forks source link

"name" of CRAN packages #38

Closed bastistician closed 4 years ago

bastistician commented 5 years ago

The "name" returned by the citeas API depends on the steps taken to find a proper citation. This means, the name is sometimes derived from the title of a publication (BibTeX reference), sometimes from the DESCRIPTION file as "Package: Title" (and there are certainly other routes as well). I suggest the "name" for CRAN packages to always be of the latter form similar to what the CRAN package web page uses as its heading.

Always returning a name of the form "Package: Title" would be much more consistent and also more reliable. For example, the name returned for knitr and forecast is "R" instead of "knitr: A General-Purpose Package for Dynamic Report Generation in R" and "forecast: Forecasting Functions for Time Series and Linear Models", respectively. Such issues could probably be fixed in the BibTeX parser. However, the title of the publication will almost always be different from the title which the authors have chosen for their package as given in the DESCRIPTION, which IMHO is the one to use as the "name".

caseydm commented 5 years ago

Oh I see. I think I misunderstood you earlier regarding the description. Right now the title is pulled from the Bibtex entry on the Cran citation file. What you are talking about is following the Github URL and reading from the DESCRIPTION file found in the package contents. Correct?

So for tidyverse we would be reading this: https://github.com/tidyverse/tidyverse/blob/master/DESCRIPTION to get tidyverse: Easily Install and Load the 'Tidyverse'

bastistician commented 5 years ago

The problem is that you cannot reliably pull the name and title of a CRAN package from its citation page. But you can easily pull these metadata from the package's DESCRIPTION file, which is stored on CRAN, for example at https://CRAN.R-project.org/web/packages/forecast/DESCRIPTION for the forecast package. So no need to resort to a GitHub repo here.

caseydm commented 5 years ago

Hi @bastistician. I made some updates today so the software is pulling CRAN package and title from the DESCRIPTION file.

One side effect is authors are being pulled from the DESCRIPTION file as well. It is a bit more complex to pull the authors from a separate source but it can be done. Please take a look and let me know what you think.

bastistician commented 5 years ago

Your updates seem to fix the issue reported here. However, as you say, the solution is far from ideal because the generated citation is no longer appropriate in most cases:

  1. If a package on CRAN has a CITATION file, citeas should really use that information to generate the desired citation. It seems that CITATION files are now ignored.

  2. As a final resort, a citation can be generated from the DESCRIPTION file similar to what citation(package, auto = TRUE) returns in R. This should only consider full authors ("aut" role), not contributors ("ctb") or other parties.

BTW, for non-R parsers of R package DESCRIPTION files, it might be easier to extract the author list from the plain-text Author field rather than from the Authors@R field:

caseydm commented 5 years ago

Ok I will pull some of the other fields such as authors from the citation file. That's one way the software needs to be improved, in that it is sometimes better to combine sources rather than stop at the first valid source found.

caseydm commented 5 years ago

Hello. Want to let you know I reverted the software to primarily pull from the citation. Mixing the CITATION and DESCRIPTION data is a good idea, but it requires some larger changes to the code base and there are some smaller things to need to be fixed first. I'll try to revisit this in a couple months.