ropensci / software-review

rOpenSci Software Peer Review.
286 stars 104 forks source link

Submission: refnet #256

Closed aurielfournier closed 4 years ago

aurielfournier commented 5 years ago

Summary

Package: refnet
Type: Package
Title: Thomson Reuters Web of Knowledge/Science and ISI Reference Data Tools
Version: 0.6
Date: 2018-08-26
Authors@R: c(person("Auriel M.V. Fournier", "Developer", role = c("aut"),
                     email = "aurielfournier@gmail.com"),
              person("Forrest R. Stevens", "Developer", role = "aut"),
              person("Matthew E. Boone", "Developer", role = "aut"),
              person("Emilio Bruna", "Developer", role=c("aut","cre"), 
              email="embruna@ufl.edu"))
Maintainer: Emilio Bruna <embruna@ufl.edu>
Description: This function reads Thomson Reuters Web of Knowledge/Science and ISI format reference data files into an R friendly data format and can optionally write the converted data to a friendly CSV format.
License: GPL-3
Imports: maptools, maps, rworldmap, RecordLinkage, Matrix, igraph, network, sna, Hmisc, ggplot2, stringi, stringr, ggmap, Rdpack, tidyr, dplyr, tibble
RoxygenNote: 6.1.0
RdMacros: Rdpack
Suggests: testthat, utils
VignetteBuilder: utils
Encoding: UTF-8

Data extraction and munging, since it takes data from one format, and transforms it into something that is useful, and also matches up records among authors.

[Note, the link for the package fit, does not lead to that page anymore, and I couldn't find anything about package fit in the linked policies]

Scientists interested in studying the networks of a particular author, subject area or journal.

https://github.com/ropensci/onboarding/issues/247

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

Heather Piwowar @hpiwowar

sckott commented 5 years ago

Thanks very much for your submission @aurielfournier - we're discussing now and will get back to you soon

maelle commented 5 years ago

Thanks for your submission @aurielfournier! I see the package doesn't have any test and doesn't have continuous integration yet. I suggest we put the submission on hold while you sort that, unless you can and want to add this within a week or so? There is some guidance in this guide, and I am happy to answer any question here or via Slack!

I was also looking at the dependencies, there are many of them in DESCRIPTION and

maelle commented 5 years ago

@aurielfournier for info I've just added the holding label, please update this thread once you have had time to work on the package, and ask me any question.

aurielfournier commented 5 years ago

Thanks @maelle I and my co-authors are working on it, but its taken a bit longer then we expected. Appreciate your patience!

maelle commented 5 years ago

No problem, and happy to help if I/we can!

aurielfournier commented 5 years ago

@maelle package now has tests and continuous integration.

I removed stringi from the DESCRIPTION file.

I have fixed the issues from the NAMESPACE file.

Huge thanks to my coauthor @birderboone for doing the heavy lifting to get this over the finish line!

I think we are ready for review now. If you have any other things that need to be addressed let me know.

Thanks!

maelle commented 5 years ago

:wave: @aurielfournier @birderboone! Awesome, thanks to both of you! A few comments before I do the last editor checks:

aurielfournier commented 5 years ago

Hi @maelle

moved the badge

Addressed the Travis warnings/notes

I closed the two open issues. thanks for pointing out the milestones, I had forgotten about that.

I added the CoC and contribution guides.

Thank you so much for all the links and tips on how to address these issues, it is greatly appreciated.

Build is now passing!!

maelle commented 5 years ago

Yay, green badge! Can you also add a coverage badge? Run usethis::use_coverage("codecov") which will give you stuff to add to the Travis config file, browse codecov maybe (I can't remember) and give you the code to paste in the README to get a badge.

aurielfournier commented 5 years ago

Done! Sorry I missed that.

maelle commented 5 years ago

Thank you! A few more things before I search for reviewers (then they and you have less work :wink:).


Editor comments

✖ write short and simple functions. These
    functions have high cyclomatic complexity:authors_clean
    (68).

Maybe you can split it in several helper functions?

  ✖ omit "Date" in DESCRIPTION. It is not required
    and it gets invalid quite often. A build date will be
    added to the package when you perform `R CMD build` on it.
  ✖ add a "URL" field to DESCRIPTION. It helps users
    find information about your package online. If your
    package does not have a homepage, add an URL to GitHub, or
    the CRAN package package page.
  ✖ add a "BugReports" field to DESCRIPTION, and
    point it to a bug tracker. Many online code hosting
    services provide bug trackers for free,
    https://github.com, https://gitlab.com, etc.

Run usethis::use_github_links().

  ✖ avoid long code lines, it is bad for
    readability. Also, many people prefer editor windows that
    are about 80 characters wide. Try make your lines shorter
    than 80 characters

    R\authors_clean.R:24:1
    R\authors_clean.R:26:1
    R\authors_clean.R:29:1
    R\authors_clean.R:35:1
    R\authors_clean.R:36:1
    ... and 162 more lines

It's the complicated function, one more reason to try and simplify it?


  ✖ omit trailing semicolons from code lines. They
    are not needed and most R coding standards forbid them

    R\authors_refine.R:20:198

  ✖ avoid sapply(), it is not type safe. It might
    return a vector, or a list, depending on the input data.
    Consider using vapply() instead.

    R\authors_clean.R:97:17
    R\authors_clean.R:132:19
    R\authors_clean.R:152:19
    R\authors_clean.R:186:17
    R\authors_clean.R:223:22
    ... and 14 more lines

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...) expressions. They are error
    prone and result 1:0 if the expression on the right hand
    side is zero. Use seq_len() or seq_along() instead.

    R\authors_clean.R:56:15
    R\authors_clean.R:71:79
    R\authors_clean.R:188:15
    R\authors_clean.R:209:21
    R\authors_clean.R:422:12
    ... and 9 more lines

  ✖ avoid 'T' and 'F', as they are just variables
    which are set to the logicals 'TRUE' and 'FALSE' by
    default, but are not reserved words and hence can be
    overwritten by the user.  Hence, one should always use
    'TRUE' and 'FALSE' for the logicals.

    R/authors_clean.R:NA:NA
    R/authors_clean.R:NA:NA
    R/authors_clean.R:NA:NA
    R/authors_clean.R:NA:NA
    R/authors_clean.R:NA:NA
    ... and 38 more lines
[![](https://badges.ropensci.org/256_status.svg)](https://github.com/ropensci/onboarding/issues/256)

It'll turn green when your package is approved.


Reviewers: @njahn82 @bmkramer Due date: 2018-12-12

aurielfournier commented 5 years ago

Hi @maelle

We are going to pause, and redo authors_clean() to be simpler/broken down into several functions. This will probably take ~ 1 week.

Thanks!

maelle commented 5 years ago

Ok, thank you!

aurielfournier commented 5 years ago

Alright! After some fighting with Travis the past 24 hours, we are good to go.

@birderboone split up authors_clean into three smaller internal functions, that should make review of the code easier. We've also addressed the other comments from @maelle

If I missed something, let me know.

Thanks!

maelle commented 5 years ago

Thanks @aurielfournier and @birderboone!

A few more things from goodpractice to tackle before I look for reviewers

It is good practice to

  ✖ add a "BugReports"
    field to DESCRIPTION, and point
    it to a bug tracker. Many
    online code hosting services
    provide bug trackers for free,
    https://github.com,
    https://gitlab.com, etc.

Simply run usethis::use_github_links()

  ✖ use '<-' for
    assignment instead of '='. '<-'
    is the standard, and R users
    and developers are used it and
    it is easier to read your code
    for them if you use '<-'.

    tests\testthat\test_authors_match.R:4:5

The styler package might help.


  ✖ avoid long code
    lines, it is bad for
    readability. Also, many people
    prefer editor windows that are
    about 80 characters wide. Try
    make your lines shorter than 80
    characters

    R\authors_address.R:12:1
    R\authors_address.R:14:1
    R\authors_address.R:17:1
    R\authors_address.R:20:1
    R\authors_address.R:41:1
    ... and 174 more lines

  ✖ avoid sapply(), it is
    not type safe. It might return
    a vector, or a list, depending
    on the input data. Consider
    using vapply() instead.

    R\plot_net_address.R:32:26
    R\plot_net_address.R:33:26
    tests\testthat\test_references_read.R:10:17

  ✖ avoid 1:length(...),
    1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...)
    expressions. They are error
    prone and result 1:0 if the
    expression on the right hand
    side is zero. Use seq_len() or
    seq_along() instead.

    R\authors_georef.R:55:25
    R\authors_georef.R:71:15
    R\authors_georef.R:113:17
    R\plot_net_address.R:34:35
    R\plot_net_address.R:123:22
    ... and 1 more lines

  ✖ fix this R CMD check
    NOTE: Namespaces in Imports
    field not imported from:
    'Rdpack' 'maps' 'stringr' All
    declared Imports should be
    used.

  ✖ avoid 'T' and 'F', as
    they are just variables which
    are set to the logicals 'TRUE'
    and 'FALSE' by default, but are
    not reserved words and hence
    can be overwritten by the user.
    Hence, one should always use
    'TRUE' and 'FALSE' for the
    logicals.

    R/authors_address.R:NA:NA
    R/authors_address.R:NA:NA
    R/authors_address.R:NA:NA
    R/authors_georef.R:NA:NA
    R/authors_georef.R:NA:NA
    ... and 15 more lines

And from me: could you please add a coverage badge? usethis::use_coverage() should help you with that.

Thanks in advance and thanks for all your work until now! 😸

aurielfournier commented 5 years ago

Hi @maelle

Thanks as always for your kind patience. It is greatly appreciated.

I've addressed all of the above, and I finally downloaded goodpractice for myself to check things.

The one issue that I don't totally understand, but isn't throwing an issues in goodpractice is this one

fix this R CMD check NOTE: Namespaces in Imports field not imported from: 'Rdpack' 'maps' 'stringr' All declared Imports should be used.

I thought that meant that I needed to remove Rdpack, maps and stringr from the DESCRIPTION file. So I did, but then the build failed, and it did not pass unless I put Rdpack back in.

But otherwise I think we're ok. :D

maelle commented 5 years ago

Thanks a lot @njahn82 @bmkramer for accepting to review this package! 😺 Your reviews are due on 2018-12-12.

As a reminder, our reviewer guide can be found here and the review template here.

maelle commented 5 years ago

:wave: @njahn82 @bmkramer! Friendly reminder that your reviews are due in two days, on 2018-12-12. 😺

njahn82 commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 6 hours


Review Comments

This is a specific package used for manipulating and analyzing authorship data from the Web of Science (WoS), a large toll-access literature and citation database indexing articles from around 12.000 academic journals. The packages imports local files that needs to be manually downloaded from the database. This is a quite common workflow when re-using WoS data, because API access is very costly and limited.

I was very excited to see that refnet addresses the problem of author disambiguation and affiliation extraction using WoS data. As a data analyst for scholarly communication at a research library, I sometimes create co-authorship networks. For this task, I often use Web of Science data. It is very laborious to parse the different text strings representing authors and institutions and to disambiguate them. I especially like that refnet supports a workflow where automatic and manual cleaning steps are supported.

Unfortunately, I had a hard time to get started with the package, because it took me a while to find information about what WoS data export format was needed, and how to load the data into R using the package.

After downloading the data, my first attempts loading the file into R failed:

library(refnet)
my_data <- references_read(data = "wos_ropensci.txt")
## Error in references_read(data = "wos_ropensci.txt"): ERROR:  The specified file or directory does not contain any 
##          Web of Knowledge or ISI Export Format records!

It took me a while (and many manual downloads from the WoS) to realize, that the param dir needs to be set to FALSE when I want to load just one file.

my_data <- references_read(data = "wos_ropensci.txt", dir = FALSE)

I feel that the average R user is not as patient when appropriate starting instructions are missing. My main request as reviewer would be therefore to improve high-level documentation, as well as to provide a sample dataset to play with.

I suggest expanding the README and to present an overview and some details in a refnet-package.Rd file, which is currently missing, so that users can type ?refnet-package for help.

Here are some other observations and suggestions that might helpful for improving the package.

Runnable documentation

Although the long-form documentation nicely explains the motivation and the workflow, it seems that the vignette does not process code chunks with functions from the package. I would suggest to add executable examples to successfully demonstrate to the users what can be done with the package. It would also be helpful to include an Rmarkdown file used to generate README.md with at least one runnable example.

Installation and Building

Installed easily, but it does not passed R CMD full check with --as-cran . There were two Errors and two Notes:

Two Errors

Conflicting package names (submitted: refnet, existing: RefNet [https://bioconductor.org/packages/3.7/bioc])

https://www.bioconductor.org/packages/release/bioc/html/RefNet.html

Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
    |                                                                      |   0%
    |                                                                            
    |======================================================================| 100%[1] "Now processing all references files"
  [1] "Now processing all references files"

    |                                                                            
    |                                                                      |   0%
    |                                                                            
    |======================================================================| 100%══ testthat results  ═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
  OK: 66 SKIPPED: 0 FAILED: 2
  1. Failure: Net plots work (@test_plots.R#95) 
  2. Failure: Net plots work (@test_plots.R#96)

Two Notes

* checking DESCRIPTION meta-information ... NOTE
Maintainer field differs from that derived from Authors@R
  Maintainer: ‘Emilio Bruna <embruna@ufl.edu>’
  Authors@R:  ‘Emilio Bruna Developer <embruna@ufl.edu>’

* checking top-level files ... NOTE
Non-standard file/directory found at top level:
  ‘missing_addresses.csv’

As described in the Check logs, there is a conflicting Bioconductor package with the same name: RefNet. To comply with rOpenSci and CRAN, a new package name is needed. A name where also the database name is included could help users that want to work with Web of Science data files to discover the package.

The package has a test suite for main functions, which succeeds in RStudio, but not when checking the package bundle (see R CMD check output)

It seems that the Authors@R documentation in the Writing R Extensions is misleading because in this package "Developer" is used as family name in the author names field as well, probably explaining why the author and maintainer field differs. Family names instead of "Developer" needs to be added to the Authors@R vector.

missing_addresses.csv needs to be passed to .RBuildignore, or removed when not needed.

In the Documentation, the brief "About" refers to Thomson Reuters as company behind the Web of Science. Ownership changed recently to Clarivate Analytics.

Tests

The package uses automatic testing, which is great. Tests could be expanded to cover more functionalities. For instance, authors_georef() does not check geo-coding using Google Maps.

While testing data export functionalities, files are written into the testthat folder. I would suggest to avoid this behaviour using unlink() after the tests. Here's an example how to use unlink from the rio package.

Functions

Main functions have many lines, which makes it very hard to follow what is going on. It would be great, if these functions could be split into smaller units.

references_read() seems to contain a lot of repeated code to import data as data.frame. I wonder if the WoS csv export file format could be used instead of the Plain Text format? When the data is rectangular, the readr package has great functionalities to strip out whitespace, which takes much room in the function, and to define colClasses while loading files into R.

When importing data with references_read(), values in many columns end with a line break \n.

Some console messages are invoked by using the print() method (see https://github.com/embruna/refnet/search?l=R&q=print+%2A.R). To enable user-friendly surpression, message() and warning() can be used instead.

There are various issues when checking the code syntax with lintr::lint() that needs to adressed.

Documentation of functions can be improved by making more use of roxygen2 tags. Not all functions have examples. Internal functions should be tagged with @noRd to avoid that they are added to the manual.

Use of functions from other packages

The use of functions from other packages could be made more explicit to the users. In many cases, it is not possible to interact with them.

authors_georef(), for example, uses ggmap::geocode to retrieve geo-coding information. Since a couple of weeks, however, keyless access to Google Maps Platform has been deprecated. Information about how to pass API keys to the function to make geocode work would be very helpful.

Functions used to visualize the networks make use ggplot2. It would be great to interact with its functionalities when calling the refnet functions.

To improve documentation of external functions, helpful tags roxygen2 like @importFrom and @inheritParams should be considered.

Maintainability

Overall, it seems that package has quite a history, and I welcome updating it. However, because of the ambiguity of author names and addresses in general, and the complicated WoS data format in particular, I wonder if more focus would improve the maintainability of the package.

One strategy could be the usage of tidyverse packages and functions. At least, they would help to dry out code for loading the data and string manipulation in a tidy way. Of course, the package would have to start with importing rectangular data and not the field tags format, which is now used.

Another would be the focus on parsing and transforming the authorship data including affiliations stored in the C1 field. Developing functions used to visualize the networks, however, could be discontinued in favor of long-form documentations, and in favor of data formats supported by Social Network Analysis packages and software.

I think that's it from me! Happy to help further with the process!

maelle commented 5 years ago

Thanks a lot for your review @njahn82! 😺

bmkramer commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [x] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Comments: DOIs missing from a number of references; 'Larivière' spelled as 'Lariviare' on two occasions; figure not displaying correctly

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 6 hours


Review Comments

General:

I really like the ability of this package to extract author and address information from WoS records. The main functionalities of the package worked (importing results, author name parsing and disambiguation and georeferencing) worked reasonably well for me. The visualization functions I could not get to work.

Information on functionality provided in vignette is detailed and complete; it would be good though to duplicate some of this (e.g. the examples) in Readme.md to get people started on how to use the package.

I have focused my review on functionality of the package. Comments below are based on performing all tasks as described in the vignette, with a custom test dataset downloaded from Web of Science (first 500 articles from PeerJ in 2018)

1 Introduction In describing the package, it is mentioned that the processed data-sets can be exported in tidy formats for more in-depth analyses with other packaged. It is not mentioned in what format export is useful for other packages. Perhaps this is self-explanatory (the csv-outputs provided), but it would be good to specify.

2.0 Using Refnet

2.1 Importing Search Results

2.2 Author address parsing and name disambiguation

2.3 Georeferencing author institutions

-> address_column has value -> retry limit not discussed

2.4. Data Visualization: Productivity and Collaboration

I could not get these to work, see error messages (and some analysis on them) below.

plot_net_coauthor_2 <- plot_net_coauthor(PeerJ_2018_2_georef) Error in data[!is.na(data$country), ] : incorrect number of dimensions

plot_net_country_2 <- plot_net_country(PeerJ_2018_2_georef) Error in data[!is.na(data$country), ] : incorrect number of dimensions

bmkramer commented 5 years ago

With apologies for the late review!

maelle commented 5 years ago

Thanks a lot for your review @bmkramer! :smile_cat:

Reg "The package conforms to the rOpenSci packaging guidelines" the question is whether you see any discrepancy between https://ropensci.github.io/dev_guide/building.html and the package, if you have time you are qualified to assess, and you can ask me any question.

Was time or another problem the reason for not running tests? Happy to help if needed (well I can't help with time :smile: ).

Thanks a lot for your feedback in any case!

maelle commented 5 years ago

@aurielfournier @birderboone now both reviews are in! :tada:

aurielfournier commented 5 years ago

Just a note to all involved that @birderboone and I are working on the edits (huge thanks to the reviewers!), we're just a bit slowed down by other things at the moment, but we should have everything addressed by February 5th. Thanks for your patience!

aurielfournier commented 5 years ago

Thanks to the reviewers (@bmkramer , @njahn82) to providing several useful links to resource that made addressing their comments much easier! Your comments were very helpful and constructive and the package is much better off for them, we really appreciate your time!

First, if you look at the repo, you will see that build is failing. This is because all of this happened, and basically if we were able to get travis to use the github version of ggmap, everything would be fine, but until those changes are on CRAN, the travis build will fail. Since our response was due today, and other then this issue we're ready for you all to look at it again I'm tossing the ball over to your side of the court. If you would like to wait till ggmap on CRAN is updated, and the build passes, that is fine by us.

Below is our response broadly to the reviewer comments, if you would prefer a comment by comment response, let me know and I'm happy to do that. Thanks!

Auriel, Matt and Emilio

~~

We are choosing at this time to not split up our functions anymore then they already are. We have split up the original functions into smaller pieces two times already.

While we appreciate the suggestion for using tidyverse functions, and we do use them in many other contexts, due to the changes in the tidyverse packages, that are not always backwards compatible, we have chosen to avoid them in many cases to avoid this package breaking because of that in the future.

We have changed the name of the package to refsplitr to avoid the conflict on CRAN

We have changed references_read to have dir default to FALSE, to help alleviate the issues the reviewer had

We removed all the csv writing outputs for functions where that was not needed as apart of the author refining process.

All typos and other small changes have been made, thank you to both reviewers for catching them

We have removed any need for the google API from the package, since between when we submitted and now it can no longer be used for free.

We have fixed the plotting functions to the best of our ability, some we were unable to replicate. If the reviewers find them again, can they share their input file so we can better diagnose the issue?

The reviewer is correct in that ciw formats can also be processed, and we have revised the text of the vignette to reflect that. We were initially reluctant to mention this because we wanted to avoid users download files in proprietary formats, but ciw files can be opened with a text editor). We have also edited the Appendix showing how to download search results to include direct download of ciw files from the search results without going to marked list.

Reviewer comment : In my tests, export does not need to be via Marked list, but also works via download menu in search results, either as Endnote export (.ciw) or as 'Other file formats', with 'full record' and 'plain text' selected. This works much faster than via Marked lists.

Response: This is indeed faster because it eliminates several steps. However, this will download all records resulting from a search, including any that were incorrectly returned (e.g., those by an author with an identical name). If users wish to filtering results prior to download to avoid including unwanted publications, then the best approach to save only the desired records to the Marked List and download from there as either a .ciw or .txt. Appendix 1 has been amended to include this option.

maelle commented 5 years ago

:wave: @aurielfournier @birderboone! Thanks for your answer.

if we were able to get travis to use the github version of ggmap

You can do that! See https://docs.travis-ci.com/user/languages/r/#remote-package :-)

@njahn82 @bmkramer thanks again for your reviews. Are you happy with the authors' response above?

aurielfournier commented 5 years ago

Thanks for the link @maelle

I've added in the needed argument to the travis yml file, and the build still isn't working, though all the tests pass on my machine when I use the github version of ggmap, though it did take restarting everything to make that happen

So I'm not sure what is going on. :/

aurielfournier commented 5 years ago

Ok! I worked with some of the awesome ladies over in R-Ladies today, and we figured out the issue.

Its actually an issue with ggmap. Jenny Bryan opened up an issue in their repo about it. .

The solution that Jenny suggested was adding options(ggmap = list(display_api_key = FALSE)) at the top of authors_georef.R and now the build is passing. 🎉

maelle commented 5 years ago

Awesome, well done you and Jenny!

jennybc commented 5 years ago

If you're going to set the option in this way, which seems reasonable for a semi-temporary workaround, you should technically be a bit more careful to put things back the way you found them. You only want your value of FALSE to hold for the duration of this function's execution.

At the place where you set the option, you could capture the existing value and immediately use on.exit() to schedule its restoration. Or you could use withr::local_options() to accomplish both at once, with the downside that you'd need to Import withr.

dkahle commented 5 years ago

That problem should be fixed from ggmap's side now (with https://github.com/dkahle/ggmap/commit/0c68d5c); let me know if that doesn't do it. Sorry for the problem!

maelle commented 5 years ago

@njahn82 @bmkramer thanks again for your reviews. Are you happy with the authors' response above?

njahn82 commented 5 years ago

Sorry for my late reply. First of all thank you for your kind words and the changes you made. I am particularly impressed about your engagement with the R community to improve your work.

Before addressing the changes made, I wonder if I missed that runnable R code chunks were added to the README or vignette. As far as I see the vignette does not execute functions from the package, and there is no README.Rmd file. I am afraid it is formal requirement from rOpenSi that the vignette demonstrates that major functionality from the package runs successfully. As a user, I often look for such runnable examples before getting started with a package.

Can you point me to such document?

aurielfournier commented 5 years ago

Hi @njahn82 .

Perhaps I am misunderstanding the question, but refsplitr/vignettes/refsplitr-vignette.Rmd contains chunks of code that can be run by the user which execute each function from the package. Which is a change we made in this last revision.

for example: line 66

example_refs <- references_read(data = system.file("extdata", "example_data.txt", package = "refsplitr"),
                                    dir = FALSE)

Is this not what you meant?

I also just updated the ReadMe file to have a the same example shown in the vignette.

njahn82 commented 5 years ago

Sorry for the confusion. I thought of R code chunks indicated by curly brackets (```{r}) that are evaluated when a R Markdown document is rendered. The resulting output file shows the R output. In the vignette, it seems that package functions are highlighted (```r). When rendered, no R output is presented, but screenshots from spreadsheet software. Example: https://github.com/embruna/refsplitr/blob/16e7308fe75044e53848ab3bbecb80abb3cb7264/vignettes/refsplitr-vignette.Rmd#L99-L110

It would be be great to have some reproducible examples for the package's main functionalities.

aurielfournier commented 5 years ago

Agreed! We'll get right on it. Ggmap is giving us some issues again, but once we get those resolved we'll make those edits to the vignette and report back.

Thanks for clarifying!

aurielfournier commented 5 years ago

Alright. We resolved the issue with ggmap. A vignette with rendered R output is now in the repo! Let me know if you have any other comments!

maelle commented 5 years ago

:wave: @bmkramer @njahn82, are you both happy with the authors' response?

njahn82 commented 5 years ago

Unfortunately, I feel that some improvement is still needed.

It's great to have reproducible examples now in the vignettes. Sadly, I did not succeed building the vignette while installing the package.

So I used the rendered refsplitr-vignette.html file instead: When describing plot_net_country() and plot_net_address(), it would be better to call the $plot element directly to avoid that the other list elements are printed out. It would be great to have an example how users can generate and customize their own plots using the other outputs provided by these functions.

README.Rmd needs to be added to the .Rbuildignore file to make the package more CRAN compatible. If it is intended to submit the package to CRAN, dependencies listed in Remotes must be available via CRAN. Otherwise, there will be a warning when running R CMD check.

As noted, ownership change of the Web of Science needs to be addressed; since 2016 the Web of Science has been provided by Clarivate Analytics, and not Thomson Reuters.

I also noted that functions could be more thoroughly documented. All functions lack @examples tags followed by example R code on how to use the function. See also https://ropensci.github.io/dev_guide/building.html#examples

Source code should adhere to a code style, especially spacing, to improve the readability of the source code https://ropensci.github.io/dev_guide/building.html#code-style . Practice checks goodpractice::gp() and lintr::lint() help checking for good coding style.

Regarding the use of tidyverse, I am fine with not using it. However, as this package already makes heavy use of external packages including those from the tidyverse, I thought that it would make the programming of the package more coherent.

Lastly, while playing around with the plotting function plot_net_address(), I wondered if you want to support transparent edges by default. Then, overlapping edges would become more visible. Here's an example:

Default:

Bildschirmfoto 2019-04-01 um 14 29 59

With alpha transparency set to 0.1

transparent_edges

I also realized that ggplot2::aes_string is used, which is soft-deprecated. It is recommended to use tidy evaluation idioms instead. Would it be possible to update the ggplot2 functions accordingly?

maelle commented 5 years ago

Thanks @njahn82 for these useful review comments! @aurielfournier @birderboone could you please address those?

aurielfournier commented 5 years ago

Hi, Thanks @njahn82 for the comments. We'll get them addressed, it may be a bit delayed though as I'm on day 1 of two straight weeks of all day courses, but hopefully by the end of the month.

aurielfournier commented 5 years ago

Hi All, Matt and I are working on this, but its likely going to be mid May before we have everything pulled together. We apologize for the delay, and thank you for your patience, we're both doing this outside of our day jobs.

maelle commented 5 years ago

:wave: @aurielfournier & @birderboone! Thanks for the update, I understand.

maelle commented 5 years ago

:wave: @aurielfournier & @birderboone! Mid-May is now, any update? :wink:

aurielfournier commented 5 years ago

Hi @maelle :D we (and by we I mean mostly @birderboone ) are working on it! Its close to being done, we should have stuff for you all by the end of the month. Thanks for your patience!

birderboone commented 5 years ago

Hello, So the package should be ready.

Thank you for your patience

maelle commented 5 years ago

Thanks @birderboone!

Regarding naming, the most important thing is to be consistent within the package, to make it easier for e.g. new contributors to pick things up.

@njahn82 does the response by the authors above address your concerns? Thanks in advance!

njahn82 commented 5 years ago

Thank you! I lack time to look into the changing of the internal structure of the package. However, there are still some issues with the vignette. To speed things up, I sent an pull request, which addresses the following:

In the vignette, there is a non-runnable code chunk, which fails when executed: https://github.com/embruna/refsplitr/blob/master/vignettes/refsplitr-vignette.Rmd#L192

Is it possible to support transparency in plot_net_address() as well?

I saw that aes_string() was changed to aes_. Unfortunately, this function will be soft-deprecated in the near future as well. It is recommended to use tidy evaluation idioms instead.

There are three warnings and two notes when the package is built using travis.

I do not want to be picky, but feel that fixing these issues will help to build trust in the functionalities of this useful package.

maelle commented 5 years ago

thanks a lot @njahn82! :smile_cat: