ROpenSci Review - Documentation

After downloading the data, my first attempts loading the file into R failed:

library(refnet)
my_data <- references_read(data = "wos_ropensci.txt")

## Error in references_read(data = "wos_ropensci.txt"): ERROR:  The specified file or directory does not contain any 
##          Web of Knowledge or ISI Export Format records!

It took me a while (and many manual downloads from the WoS) to realize, that the param dir needs to be set to FALSE when I want to load just one file.

my_data <- references_read(data = "wos_ropensci.txt", dir = FALSE)

I feel that the average R user is not as patient when appropriate starting instructions are missing. My main request as reviewer would be therefore to improve high-level documentation, as well as to provide a sample dataset to play with.

I suggest expanding the README and to present an overview and some details in a refnet-package.Rd file, which is currently missing, so that users can type ?refnet-package for help.

Runnable documentation

Although the long-form documentation nicely explains the motivation and the workflow, it seems that the vignette does not process code chunks with functions from the package. I would suggest to add executable examples to successfully demonstrate to the users what can be done with the package. It would also be helpful to include an Rmarkdown file used to generate README.md with at least one runnable example.

[x] It seems that the Authors@R documentation in the Writing R Extensions is misleading because in this package "Developer" is used as family name in the author names field as well, probably explaining why the author and maintainer field differs. Family names instead of "Developer" needs to be added to the Authors@R vector.
[x] missing_addresses.csv needs to be passed to .RBuildignore, or removed when not needed.

In the Documentation, the brief "About" refers to Thomson Reuters as company behind the Web of Science. Ownership changed recently to Clarivate Analytics.

[x] To improve documentation of external functions, helpful tags roxygen2 like @importFrom and @inheritParams should be considered.
[x] While testing data export functionalities, files are written into the testthat folder. I would suggest to avoid this behaviour using unlink() after the tests. Here's an example how to use unlink from the rio package.

reviewer 2

[ ] In describing the package, it is mentioned that the processed data-sets can be exported in tidy formats for more in-depth analyses with other packaged. It is not mentioned in what format export is useful for other packages. Perhaps this is self-explanatory (the csv-outputs provided), but it would be good to specify.
[x] could not find test data ‘example_data.txt'
[ ] refnet_fig1.jpg in Appendix 1 is very low res
[ ] For WoS export: in text it is mentioned that both .txt and .ciw formats can be processed, but the worked example in Appendix 1 only shows how to export as .txt. Would be good to harmanize, and perhaps explain in text that .ciw is format for Endnote export
[ ] In my tests, export does not need to be via Marked list, but also works via download menu in search results, either as Endnote export (.ciw) or as 'Other file formats', with 'full record' and 'plain text' selected. This works much faster than via Marked lists.
[x] In references_read(), what is the default for dir=T/F? args(references_read) says it's TRUE
[x] -- typo in example a): .txr
[x] -- c) is not a separate example
[x] -- remark in Appendix 2 on fields only included when all.fiels=T is included in references_read() should be included in main text describing references_read()
[x] testing all.fields=T results in error: unused argument (all.fields=T) args(references_read) reveals it should be include_all=FALSE
[x] Function authors_clean(): no csv file saved. Function contains argument write_out_data = FALSE. Tried TRUE => 2 files saved (authors_review, authors_prelim)
- [x] Function also contains argument sim_score (value 0.88) - this is not explained in the documentation (it is mentioned for authors_refine where it has a NULL value)
[x] In documentation under 2.2.2, -- reference is made to Appendix 2, this should be Appendix 3. -- it is stated: 'Users that prefer to manually review the results of the disambiguation can do so with the “authors” object and .csv files' -> unclear which of the 2 csv files (prelim or review) should be taken (I assume review from the documentation of the next step. Also: 'authors' object is unclear)
- [x] "Corrections made to the “review” file are merged into the “preview” file"-> should be "prelim"
- [x] In Appendix 3: -- explanation on author name disambiguationis informative and useful. It does have some spelling and style issues, not critical, but could do with a careful edit -- kudos for encouraging authors to sign up for (and use!) ORCID! -- not covered: sim_score -- layout of table 2 (2 columns) is mangled -- In table 2, similarity is listed as NA, but it has a value in my test data
Example for authors_georef is incomplete example_georef <-authors_georef(------,------,-----)

Explanations of arguments is incomplete: function (data, address_column = "address", filename_root = "", write_out_missing = TRUE, retry_limit = 10)

-> address_column has value -> retry limit not discussed

In documentation: not clear when which geocoding application is used when (sequentially?).
http://www.datasciencetoolkit.org/ and/or
https://developers.google.com/maps/documentation/.

In documentation, it is stated 'an output/file of references that refnet was unable to georeference, which the user can review, manually correct, and import back into the file of georeferenced author locations
-> file seems to contain all lines (with and without lat/long resolved)
-> unclear how 'import back into file' should be performed

ropensci / refsplitr

ROpenSci Review - Documentation #61

reviewer 2