ourresearch / citeas-api

Get the scholarly citation for any research product: software, preprint, paper, or dataset
https://citeas.org
MIT License
69 stars 4 forks source link

R packages on CRAN: prefer CITATION over in-text DOI's #26

Closed bastistician closed 5 years ago

bastistician commented 6 years ago

I'm referring to http://citeas.org/cite/CRAN.R-project.org/package=surveillance as an example. Here, the algorithm stops when it finds a DOI in the description text of the package (it returns the first one of those). However, the standard way for R package authors to provide a desired citation format is via a CITATION file, and that one is ignored in the above example. IMHO, if such a file exists for a package on CRAN, it should be preferred over other references. This would also mimic how R's citation() operates. What do you think?

bastistician commented 6 years ago

For comparison: http://citeas.org/cite/CRAN.R-project.org/package=emdi So it seems that if a package has an associated GitHub repository, the CITATION file is checked first and if this fails a standard citation is eventually generated from the DESCRIPTION -- ignoring a DOI in its text (which is actually desired in this case).

rkillick commented 6 years ago

My package changepoint (https://cran.r-project.org/web/packages/changepoint/) also fails the checks. I have a citation file and not only does the algorithm ignore this (understandable as it has a "if you have used this...cite this...." type form), it then doesn't give the authors in the correct order - which is the order listed in the author field.

bastistician commented 6 years ago

It should be mentioned that in your case, the CITATION file is found via the GitHub repo but eventually discarded ("Looking in the CITATION file, we didn't find R CITATION format"), because citeas seems to only search for citEntry() references whereas your CITATION file uses the more modern bibentry() declaration. So there are different issues in several steps of the algorithm.

rkillick commented 6 years ago

It is a different issue but the same solution to mimic R's citation() function would fix my problem too. I wasn't sure this warranted opening a new issue but i'm happy to if it is preferred.

caseydm commented 5 years ago

Hello. I refactored the CRAN library step to ingest the Bibtex entry located on CRAN citation file pages (such as this one). So the package changepoint now shows the correct ordering of authors and matches what is found on the CRAN web site.

For @bastistician, I'm still working to pull the title of your project from CRAN as well, as it is currently pulling from CrossRef. Once I do that I'll update this issue. Please take a look and let me know if this is getting closer to what you expected. Thanks!

rkillick commented 5 years ago

Thanks, I can confirm the changepoint citation now displays correctly. Thanks for taking the time to correct this.

bastistician commented 5 years ago

Thank you for the efforts, @caseydm! The result is much better now as it prefers information from the CITATION file. For surveillance, citeas correctly finds one of the DOI's referenced therein.

A difficulty with surveillance is that the intended citation actually depends on which part of the package has been used. Of course, this is not something citeas could know. I think that citeas is designed to output a single reference, so returning the first of multiple matches is a possible solution. It would be nice if citeas gave some indication if a result is not unique and refer to secondary matches (#23).

bastistician commented 5 years ago

For @bastistician, I'm still working to pull the title of your project from CRAN as well, as it is currently pulling from CrossRef.

This would be great! I think for R packages one should ideally display Package: Title as the title on the generated web page, where "Package" and "Title" are from the corresponding fields of the package description. This is the format used on the CRAN website as well as what citation("Package", auto=TRUE) generates in R.

An example where citeas currently gives the desired title format is http://citeas.org/cite/CRAN.R-project.org/package=tidyverse ("tidyverse: Easily Install and Load the 'Tidyverse'"), since it generates the reference by parsing the description anyway. In contrast, http://citeas.org/cite/CRAN.R-project.org/package=changepoint should use "changepoint: Methods for Changepoint Detection" and http://citeas.org/cite/CRAN.R-project.org/package=surveillance should use "surveillance: Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena".

caseydm commented 5 years ago

Hi @bastistician sorry for the lengthy time between replies on this. For some reason I did not receive another alert when you responded to the thread.

It would be nice if citeas gave some indication if a result is not unique and refer to secondary matches (#23). I'll add this to the to do list and see what I can do.

That's a great tip on the Package: Title issue. Like you said, CiteAs currently uses the package name and description, but it looks like for some CRAN packages we need to use name and title. I'll try to adapt the code to that and will follow up here.

caseydm commented 5 years ago

Hi @bastistician I just pushed some updates and citeas is now pulling the titles correctly from the cran citation files. Let me know if you see anything incorrect. I'm going to go ahead and close this issue.

cboettig commented 5 years ago

Sorry to ping a closed thread, but it appears that CiteAs returns the citation to the package and not citation from the CITATION file that an R user would see if they did citation(). For example:

Lang, D., & Wainwright, P.. (2018). rfishbase: R Interface to 'FishBase'. R package version 3.0.0. Retrieved from https://CRAN.R-project.org/package=rfishbase

(Note that author parsing is incorrect as well!)

Whereas citation("rfishbase") gives:

C. Boettiger, D. T. Lang and P. C. Wainwright. "rfishbase: exploring, manipulating and visualizing FishBase data from R". In: Journal of Fish Biology 81.6 (Nov. 2012), pp. 2030-2039. DOI: 10.1111/j.1095-8649.2012.03464.x

(Let me know if you'd prefer this as new/separate issues too)

caseydm commented 5 years ago

Hello. I believe when this error occurred we were pulling cran citation data from the DESCRIPTION file. Just a couple days ago I reverted back to the proper way of using the CITATION file. The current citation is built using the bibtex data in the CITATION file. Please take another look and let me know if you are still seeing an issue. If so I will reopen this.

cboettig commented 5 years ago

Nice! looks good to me.