Submission: refnet - Githubissues

aurielfournier commented 5 years ago

Summary

What does this package do? (explain in 50 words or less): refnet is a package to read, organize, geocode, analyze, and visualize Clarivate Web of Knowledge/Web of Science, format reference data files for scientometric, social network, and Science of Science analyses.
Paste the full DESCRIPTION file inside a code block below:

Package: refnet
Type: Package
Title: Thomson Reuters Web of Knowledge/Science and ISI Reference Data Tools
Version: 0.6
Date: 2018-08-26
Authors@R: c(person("Auriel M.V. Fournier", "Developer", role = c("aut"),
                     email = "aurielfournier@gmail.com"),
              person("Forrest R. Stevens", "Developer", role = "aut"),
              person("Matthew E. Boone", "Developer", role = "aut"),
              person("Emilio Bruna", "Developer", role=c("aut","cre"), 
              email="embruna@ufl.edu"))
Maintainer: Emilio Bruna <embruna@ufl.edu>
Description: This function reads Thomson Reuters Web of Knowledge/Science and ISI format reference data files into an R friendly data format and can optionally write the converted data to a friendly CSV format.
License: GPL-3
Imports: maptools, maps, rworldmap, RecordLinkage, Matrix, igraph, network, sna, Hmisc, ggplot2, stringi, stringr, ggmap, Rdpack, tidyr, dplyr, tibble
RoxygenNote: 6.1.0
RdMacros: Rdpack
Suggests: testthat, utils
VignetteBuilder: utils
Encoding: UTF-8

URL for the package (the development repository, not a stylized html page): https://github.com/embruna/refnet
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

Data extraction and munging, since it takes data from one format, and transforms it into something that is useful, and also matches up records among authors.

[Note, the link for the package fit, does not lead to that page anymore, and I couldn't find anything about package fit in the linked policies]

Who is the target audience and what are scientific applications of this package?

Scientists interested in studying the networks of a particular author, subject area or journal.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

https://github.com/ropensci/onboarding/issues/247

Requirements

Confirm each of the following by checking the box. This package:

[X] does not violate the Terms of Service of any service it interacts with.
[X] has a CRAN and OSI accepted license.
[X] contains a README with instructions for installing the development version.
[X] includes documentation with examples for all functions.
[X] contains a vignette with examples of its essential functions and uses.
[X] has a test suite.
[X] has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
[X] I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

[yes] Do you intend for this package to go on CRAN?
[yes] Do you wish to automatically submit to the Journal of Open Source Software? If so:
- [X] The package has an obvious research application according to JOSS's definition.
- [X] The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
- [not yet] The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)
[no] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
- [ ] The package is novel and will be of interest to the broad readership of the journal.
- [ ] The manuscript describing the package is no longer than 3000 words.
- [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
- (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
- (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
- (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

[yes] Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:
[yes] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Heather Piwowar @hpiwowar

aurielfournier commented 5 years ago

Thanks @njahn82. We're getting a plan together soon to address these comments. I just moved across the country and am starting a new job, so it may take us a few weeks. Thanks again for everyone's patience.

birderboone commented 5 years ago

Hello! I thank y'all for the constructive reviews throughout this process. I think the package is 100% better than day 1 of submission . Technical stuff: -I did fix the tidy evaluation to enquo and !!, though I also just read the rlang update so I should also test out and then update to the double curly method. But i didn't want you to wait on me while I do that. -I did add a linealpha argument to all network analysis. And we've been brain storming the best way to handle future ggplot tweaks a user might want to make. For tweaks that aren't as obvious as the alpha for the lines. Either coming up with a set of obvious plot tweak argument's, staying hands off, or something more elegant. -All Travis warnings have been dealt with.

-The pull request for the vignette were very helpful. I merged most and only modified slightly some. So thank you so much for the editing suggestions.

I think that was it.

-Matt

njahn82 commented 4 years ago

Thank you for these changes, they look good to me.

The vignette workflow still confuses me. I have the feeling that the vignette is not generated during the package installation process, but was compiled locally and was then moved from the vignette folder to inst/docs. Although this works, it may "shadow" package installation issues. Take for instances the dplyr package. In the vignette, dplyr is used, but not declared in the DESCRIPTION file. Also knitr is not declared.

Before we are entering another round, I would like to ask the editor @maelle if such pre-compiled vignettes comply with the rOpenSci guidelines.

birderboone commented 4 years ago

Honestly this may come down to my own confusion and lack of understanding. I am able to build the vignette locally with devtools, but when I push to github Travis gives me a warning that there is nothing in inst/doc. Which I just assumed meant they wanted a hard copy since devtools doesnt auto install vignettes anymore. I have no basis for that assumption just a guess I made, so I put a hardcopy in inst/doc

I can easily delete the inst/doc folder, but then I'll still get the Travis warning, perhaps there's something wrong in the yaml or something thats making the vignette misinstall on travis?

njahn82 commented 4 years ago

Hi @birderboone I realized that the vignette builder was not properly declared in the DESCRIPTION. After deleting some files, I got it running. When installing via GitHub, please note that the vignette is not automatically built. I used the following command

devtools::install_github("njahn82/refsplitr", build_vignettes = TRUE)

I'll send you a pull request with the changes made. Please have a careful look into it. It also seems that the package contains a few files and I am unsure whether they are needed or not.

njahn82 commented 4 years ago

I tried installing refsplitr with pak, which warns that the remote lionel-/vdiffr is not accessible. Would it be possible to use the version on CRAN instead? Maybe the R version referred to in the DESCRIPTION needs an upgrade, too?

noamross commented 4 years ago

👋 Hello all, just to let you know that I'll be the handling editor for the rest of this review process as Maëlle is on leave.

@bmkramer, have the recent changes addressed your comments? (Aside from the vignette-building issue above)

aurielfournier commented 4 years ago

Glad to have you aboard @noamross ! Welcome 👋 !

Thanks @njahn82 for the pull request. @birderboone is super busy with his day job right now, so I'm working on the stuff you suggested. Hopefully I'll have it all addressed tonight, if not tomorrow. Thanks

birderboone commented 4 years ago

Everything should be working now. Had to wait for field work to be done to find the pesky problem with vignette loading. Thanks @njahn82 on your tweeks. vdiff-r is now on cran version, I forget why we had to switch to the remote at one point.

njahn82 commented 4 years ago

Congrats @birderboone , glad that vignette loading works now. There's just a minor issue: To get rid of the last remaining NOTE message while checking the package build via travis, please add the README.Rmd file to .Rbuildignore.

Otherwise, I am happy with the author's responses, as well as with the current status of the package.

aurielfournier commented 4 years ago

Thanks @njahn82 . I got that changed made just now and the build passes.

aurielfournier commented 4 years ago

@noamross is there anything else you're waiting on from @birderboone or myself? Thanks!

maelle commented 4 years ago

I'm waiting for an answer from the other reviewer and if I don't get it I will do the checks myself this week or next. Thanks for your patience (and yes I'm back as volunteer editor, thanks @noamross for the help!)

bmkramer commented 4 years ago

Other reviewer here :-) Apologies for not responding to your email enquiry sooner @maelle - I just returned from holiday. I will try to go through the updated package this weekend or next if that is still helpful. If that is taking things too long, I also understand of course.

maelle commented 4 years ago

Thanks a lot @bmkramer! Yes it would still be most helpful. :pray: And I hope you had a nice vacation!

aurielfournier commented 4 years ago

Hi @bmkramer

Is this your review - https://github.com/embruna/refsplitr/pull/70#issuecomment-527953368

or is there more coming still? @birderboone and I just don't want to get ahead of ourselves if you are still working on your review.

Thanks!

maelle commented 4 years ago

@bmkramer is the useful comment mentioned by @aurielfournier above your response to the changes in the package? If so could you please copy-paste it here? Thanks in advance and already many thanks for that comment!

bmkramer commented 4 years ago

Ah, yes, apologies for dropping off the face of the earth for a few days - and for somehow misplacing that comment. Copy-pasting below!

As indicated, I looked at the package from a user perspective. Hope this complement the more structural/technical comments from @njahn82!

Hi all,

Wow, following this process has shown me a lot about the large effort that goes into creating a full package, including making sure it builds right. Really impressed by all the work and commitment!

Going through the package's workflow from a user perspective, I noticed that there a still a few inconsistencies/omissions in the description of the functions in the vignette, mostly to do with the arguments listed and the code examples given. I've listed these below.

I was able to run all functionality with the test dataset (example_data.txt), including all visualizations. When I tested with a user-generated dataset from Web of Science (.txt file, 490 records), I encountered two instances that gave error messages:

In the step 'splitting author records' in authors_clean, the following error message was encountered at 48% progress and execution was halted: Error in data.frame(names = unique(unlist(strsplit(C1_names, "; "))), : row names contain missing values

When using only a subset of the file (1st 100 records), this error did not occur.

Running plot_net_country() gave the following error: Error in Ops.data.frame(fromC, toC) : ‘+’ only defined for equally-sized data frames Let me know if you want to have those files for testing (can't link them here b/c of WoS restrictions)

Below my comments on my observations on the vignette for the different functions in the package:

2.1 references_read()

differences in arguments listed: -- arguments in vignette: data, dir, filename_root [NB filename_root gives error: unused argument] -- arguments in vignette example: data, package, dir -- arguments in help: data, dir, include_all -- arguments in help example: data, package example given in vignette does not correspond to recommended workflow for working with example data (creating directory 'data' and save the sample data ‘example_data.txt’ file in the ‘data’ folder) Appendix 1

export to Endnote (ciw) is positioned as the preferred/primary export format, but example data are given as .txt) export to .txt (Other file formats) is mentioned in step 4b only (not step 4) export to other file formats requires some additional parameters to be set (record content, file format) 2.2. authors_clean()

vignette states 2 arguments, while function seems to have only one (references) argument for filename_root (or similar) is implied in the vignette text make clear how to include output as csv 2.2.1-2.2.3 authors_refine()

In 2.2.1 vignette says function has 4 arguments, but only mentions review and prelim, In 2.2.3, vignette does list all 4 arguments for authors_refine() 2.4.1 plot_addresses_country()

In vignette, argument mapRegion not mentioned In vignette, no example code line provided 2.4.2 plot_addresses_points()

In vignette, argument mapCountry not mentioned In vignette, not specified that argument data is addresses element of the authors_georef() object, nor that that is NOT default in this function. 2.4.3 plot_net_coauthor()

vignette lacks example code line 2.4.5 plot_net_address()

only one of 3 additional arguments (lineAlpha) explained in vignette Hope this is still helpful!

maelle commented 4 years ago

Thanks a lot @bmkramer!

@aurielfournier @birderboone any ETA on your answer? Thanks!

aurielfournier commented 4 years ago

Hi @maelle,

Another member of our package building team, @embruna is working on them right now. He hopes to have all the edits completed by the end of the month. Thanks!

embruna commented 4 years ago

Hi all, I’m on it and hope to be done by wed or thursday.

Emilio —————— Emilio M Bruna University of Florida www.BrunaLab.orghttp://www.BrunaLab.org

On Sep 19, 2019, at 2:39 AM, Maëlle Salmon notifications@github.com<mailto:notifications@github.com> wrote:

EXTERNAL EMAIL: Exercise caution with links and attachments.

Thanks a lot @bmkramerhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_bmkramer&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=hPRQO-QHqXhD8p9F43Mezw&m=uhLX5KQslxqukOU46MadFoBVcS9ynNCCmO-6UxtN0q4&s=jTTIwcCJOii-XiOcO5KokVJhYy9G6cFh_vZQ9AwMWek&e=!

@aurielfournierhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aurielfournier&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=hPRQO-QHqXhD8p9F43Mezw&m=uhLX5KQslxqukOU46MadFoBVcS9ynNCCmO-6UxtN0q4&s=PryY6Merx3tSfIIUU6FDw0IUQFWGzz3aFN_XH1q5w2A&e= @birderboonehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_birderboone&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=hPRQO-QHqXhD8p9F43Mezw&m=uhLX5KQslxqukOU46MadFoBVcS9ynNCCmO-6UxtN0q4&s=rjM6Hwdg4Lu4QtmUR3iO76b1iVfMwYEBmarL62x3P5g&e= any ETA on your answer? Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ropensci_software-2Dreview_issues_256-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DABDQS76KNA454AYVAQGNZYDQKMNBXA5CNFSM4FYX2NV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7CMPUY-23issuecomment-2D532989907&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=hPRQO-QHqXhD8p9F43Mezw&m=uhLX5KQslxqukOU46MadFoBVcS9ynNCCmO-6UxtN0q4&s=BgmxbLWQb4U1GP8KBijCFTquqVBy6M5keDQXeVxoU7w&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABDQS72TCVPD4FPHJ7ANITLQKMNBXANCNFSM4FYX2NVQ&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=hPRQO-QHqXhD8p9F43Mezw&m=uhLX5KQslxqukOU46MadFoBVcS9ynNCCmO-6UxtN0q4&s=R61uZMZOYrwpjm1VjGxqIvChVT5AYFlCEb2-pBZVEZI&e=.

embruna commented 4 years ago

Hi @bmkramer, would you mind sending me a link to the WOS files that threw those errors (or the files themselves) so I can try and figure out what is going on? My email address is in my github profile. Thanks!

EB

bmkramer commented 4 years ago

Hi @embruna, you've got mail :-)

embruna commented 4 years ago

Hello all, a description of our changes in response to @bmkramer's excellent and thorough feedback.

I was able to run all functionality with the test dataset (example_data.txt), including all visualizations. When I tested with a user-generated dataset from Web of Science (.txt file, 490 records), I encountered two instances that gave error messages:

a) In the step ‘splitting author records’ in authors_clean, the following error message was encountered at 48% progress and execution was halted: Error in data.frame(names = unique(unlist(strsplit(C1_names, “; “))), : row names contain missing values

Clarivate added a new field - Consortium Author (CA) - which is used when a paper is written by (or has as a coauthor) a research consortium or group. In cases where there are no individuals named as authors - only the consortium author - there will be NAs in the AU and AF fields; this was the reason for the error. The authors_clean() function now checks for such cases first, and if any are discovered it stops the process and tells the user which records have a problem so they can be reviewed and corrected prior to running authors_clean(). The error message also provides guidance on how to do so.

b) Running plot_net_country() gave the following error: Error in Ops.data.frame(fromC, toC) : ‘+’ only defined for equally-sized data frames

Rare country name variants can cause this errors. In this case it was because the country listing was “Serbia” instead of “Republic of Serbia”. We added this to the list of name variants the functions checks for prior to running plot_net_country().

Below my comments on my observations on the vignette for the different functions in the package:

2.1 references_read()

a) differences in arguments listed: a.1) arguments in vignette: data, dir, filename_root [NB filename_root gives error: unused argument] a.2) arguments in vignette example: data, package, dir a.3) arguments in help: data, dir, include_all a.4) arguments in help example: data, package b) example given in vignette does not correspond to recommended workflow for working with example data (creating directory ‘data’ and save the sample data ‘example_data.txt’ file in the ‘data’ folder)

We have included all arguments and removed the text for the (deprecated) argument for the location in which to save .csv. (in an earlier version csv saving was the default, but we opted instead to make saving as a .csv optional. The revised vignette reflects this edit, and both the vignette and function show users how to save the output as a .csv if they wish).

Appendix 1 a) export to Endnote (ciw) is positioned as the preferred/primary export format, but example data are given as .txt) b) export to .txt (Other file formats) is mentioned in step 4b only (not step 4) c) export to other file formats requires some additional parameters to be set (record content, file format)

We decided to streamline the guide by eliminating the steps for downloading directly from the search results and instead recommending that users add records to the Marked List. This was in part to simplify the figure, but also because Clarivate changed the location and menu structure for direct downloads from the search page. The new figure explains records can be downloaded as either .txt or .ciw, without emphasizing one over the other.

2.2. authors_clean() a) vignette states 2 arguments, while function seems to have only one (references) b) argument for filename_root (or similar) is implied in the vignette text c) make clear how to include output as csv

We have edited the vignette and examples. We originally had a second argument - the location to save a csv file - but then opted instead to make saving as a .csv optional. The revised vignette reflects this edit, and both the vignette and function show users how to save the output as a .csv if they wish.

2.2.1-2.2.3 authors_refine() a) In 2.2.1: vignette says function has 4 arguments, but only mentions review and prelim, b) In 2.2.3: vignette does list all 4 arguments for authors_refine()

We have added descriptions of all arguments to the vignette and function help, and included all arguments in the examples for 2.2.1 and 2.2.3.

2.4.1 plot_addresses_country() a) In vignette, argument mapRegion not mentioned b) In vignette, no example code line provided

We have included all arguments in the vignette, in the examples and function help, and now provide an example with all arguments.

2.4.2 plot_addresses_points() a) In vignette, argument mapCountry not mentioned b) In vignette, not specified that argument data is addresses element of the authors_georef() object, nor that that is NOT default in this function.

We have included all arguments in the vignette and in the examples and function help. The vignette and help now specifies the assignment of the ‘data’ argument. For all functions we now indicate which arguments are optional and what the default argument values used in cases where they are not included.

2.4.3 plot_net_coauthor(): a) vignette lacks example code line

We have added an example code line.

2.4.5 plot_net_address() a) only one of 3 additional arguments (lineAlpha) explained in vignette

We have included all arguments in the vignette, in the examples and function help, and now provide an example with all arguments.

maelle commented 4 years ago

Thanks @embruna! @bmkramer, are you happy with the response above?

bmkramer commented 4 years ago

Yes, I am, thanks @embruna!

maelle commented 4 years ago

Thank you @bmkramer!

@embruna @aurielfournier @birderboone We're now closer to approval. :wink:

Some comments from me:

I made a PR to remove the useless Maintainer field from the DESCRIPTION (the field is useless, not the maintainer :slightly_smiling_face: )
The README could do with a bit more work to make it more attractive to potential users, which had been mentioned a bit earlier. In particular,
- There's no plot, although the workflow has a plot command. And the workflow section could use a bit text.
- I understand the need to inform people of the transfer from R-Forge but in my opinion it should come at the end of the README, because that's not what a potential user needs to see to be convinced (and they might stop reading sooner)
- I really like @njahn82's summary at the beginning of his review "This is a specific package used for manipulating and analyzing authorship data from the Web of Science (WoS), a large toll-access literature and citation database indexing articles from around 12.000 academic journals. The packages imports local files that needs to be manually downloaded from the database. This is a quite common workflow when re-using WoS data, because API access is very costly and limited.", could (part of) it find a place in the README?
Via goodpractice I get a test coverage of 65%,

> covr::package_coverage()
refsplitr Coverage: 65.91%
R/plot_net_address.R: 0.00%
R/plot_net_coauthor.R: 0.00%
R/plot_net_country.R: 0.00%
R/authors_clean.R: 73.81%
R/authors_georef.R: 75.00%
R/authors_refine.R: 81.82%
R/authors_parse.R: 86.76%
R/authors_address.R: 87.34%
R/plot_addresses_points.R: 89.74%
R/authors_match.R: 91.08%
R/split_names.R: 96.00%
R/references_read.R: 96.34%
R/plot_addresses_country.R: 100.00%

Why does the badge show a different percentage? Three functions have no tests, correct? The coveralls report seems to not be for the latest commit.

Other goodpractice output

✖ write short and simple functions. These
    functions have high cyclomatic complexity:authors_match
    (57).

Could it be simplified a bit?

  ✖ avoid long code lines, it is bad for
    readability. Also, many people prefer editor windows
    that are about 80 characters wide. Try make your lines
    shorter than 80 characters

    R/authors_address.R:87:1
    R/authors_address.R:92:1
    R/authors_clean.R:3:1
    R/authors_clean.R:5:1
    R/authors_clean.R:7:1
    ... and 92 more lines

  ✖ avoid 'T' and 'F', as they are just variables
    which are set to the logicals 'TRUE' and 'FALSE' by
    default, but are not reserved words and hence can be
    overwritten by the user.  Hence, one should always use
    'TRUE' and 'FALSE' for the logicals.

    R/authors_parse.R:NA:NA
    R/authors_parse.R:NA:NA
    R/authors_parse.R:NA:NA
    R/authors_parse.R:NA:NA

These are lines like https://github.com/embruna/refsplitr/blob/8e9e42904b3bda93c830664d162e01f49a8cb454/R/authors_parse.R#L58

Output from rcmdcheck::rcmdcheck(args = "--as-cran")


  Unknown, possibly mis-spelled, fields in DESCRIPTION:
    ‘Remotes’

I think you can use CRAN's ggmap now?

  Found the following (possibly) invalid URLs:
    URL: https://CRAN.R-project.org/package=XXXXX
      From: inst/doc/refsplitr-vignette.html
      Status: 404
      Message: Not Found
    URL: https://cran.r-project.org/web/packages/bibliometrix/index.html
      From: inst/doc/refsplitr-vignette.html
      Status: 200
      Message: OK
      CRAN URL not in canonical form
    URL: https://cran.r-project.org/web/packages/revtools/index.html
      From: inst/doc/refsplitr-vignette.html
      Status: 200
      Message: OK
      CRAN URL not in canonical form
    The canonical URL of the CRAN page for a package is 
      https://CRAN.R-project.org/package=pkgname

I didn't make this rule but we need to obey it.


❯ checking examples ... NOTE
  Examples with CPU or elapsed time > 5s
                     user system elapsed
  plot_net_address 10.001  0.032  10.611
  plot_net_country  6.235  0.028   6.276

When CRAN submission approaches keep this in mind.

maelle commented 4 years ago

@embruna @aurielfournier @birderboone any update? Thank you 🙂

maelle commented 4 years ago

@embruna @aurielfournier @birderboone could you please respond soon? Thanks. :smile_cat:

aurielfournier commented 4 years ago

Sorry! @embruna is taking the lead on this next round of edits and he is working on it, but it will be a bit still. @embruna is a professor at U Florida and its getting to be his busy time of the semester. Thanks for your patience.

maelle commented 4 years ago

Thanks for the update!

embruna commented 4 years ago

Yes, I'm on it! Between my new class this semester, travel, and...well, the same as the rest of us. I'm on it.

embruna commented 4 years ago

Hello, a quick update. We are almost done - here is a summary of the changes made in response to feedback from @maelle.

I made a PR to remove the useless Maintainer field from the DESCRIPTION (the field is useless, not the maintainer 🙂 ) Accepted.

The README could do with a bit more work to make it more attractive to potential users, which had been mentioned a bit earlier. In particular, There's no plot, although the workflow has a plot command.. We added one, and I defintiely think it makes it look more attractive. :)

And the workflow section could use a bit text. I added a little, though I was concerned I would just be duplicating what was in the vignette.

I understand the need to inform people of the transfer from R-Forge but in my opinion it should come at the end of the README, because that's not what a potential user needs to see to be convinced (and they might stop reading sooner). Good suggestion, now at the bottom. ACCEPTED

I really like @njahn82's summary at the beginning of his review "This is a specific package used for ...API access is very costly and limited.", could (part of) it find a place in the README? We included this, and it does nicely emphasize one reason why it's needed (cost).

All the lines of code identified as being > 80 chrs are now < 80 chrs.
All the instances of = F corrected to = FALSE
URLs in the vignettes now in CRAN canonical form.

The main thing I think we have left to deal with is the "Badge shows different % coverage", though we think that's a minor thing. We'll have an update on that ASAP. Thanks so much for the patience!

maelle commented 4 years ago

Thank you! 😀

I think you forgot to upload the README image to the repo. 😉

The coverage difference isn't that minor because 65% is below the required 75% https://devguide.ropensci.org/building.html#testing

maelle commented 4 years ago

Reg overlap between the README and vignette you can re-use content without copy pasting if you want see https://www.garrickadenbuie.com/blog/dry-vignette-and-readme/

Moreover with roxygen2 latest version one can even include Rmd in man pages so you could re-use content even more (e.g. in the package level man page) https://roxygen2.r-lib.org/articles/rd.html#including-external--rmd-md-files

Repetition between docs is fine since one never knows what an user finds first.

embruna commented 4 years ago

Thank you! 😀

I think you forgot to upload the README image to the repo. 😉

The coverage difference isn't that minor because 65% is below the required 75% https://devguide.ropensci.org/building.html#testing

ARGH! I did indeed. It should be there now.

Still looking into the coverage issue.

birderboone commented 4 years ago

Ok everything should be ready to go:

Coverall issue has been fixed. When we did the name change from refnet -> refsplittr coveralls was not properly configured to pull from the new names travis account.
Coverage statistics are running now at...i think 91.6%. For some reason the tests for the plots had skip_on_cran() which the covr package skips for test coverage as well. Thats fixed, and I added a couple extra tests.
I did some testing and ggmaps::geocode on CRAN is still not fixed. I have that the latest cran update was in February, and that version has the issue still. In ggmap::geocode this line needs to be commented out

if (source == "dsk") stop("datasciencetoolkit.org terminated its map service, sorry!")

Which it is in the github version after our issue submission, but has not been pushed to cran. Theres a lot of things that bother me about this whole situation, and hopefully datasciencetoolkit does not in the future shut down, otherwise we'll have to go a different route, or allow people to enter there google api keys (NOT IDEAL).

We have not at this time decided to split up authors_match(). This is the core of what this package is, and its a complicated piece of code to start tweaking. We are in agreement with you that it does need to be paired down and split into smaller functions, but at this moment doing so is a large time investment. So we're currently tabling it for 2.0.

Thanks for everything

Matt

maelle commented 4 years ago

Thanks! We're getting very close to approval! I have a few remaining comments

from myself
- I'd recommend changing the GitHub repo description to something more directly informative (as opposed to "update to")
- In the Travis config file, it's not a good idea to ignore warnings now that the package is in such good shape.

https://github.com/embruna/refsplitr/blob/e1492dac22ac7b553b84f715c31e01086137be83/.travis.yml#L6

from goodpractice::gp()


  ✖ avoid long code lines, it is bad for
    readability. Also, many people prefer editor
    windows that are about 80 characters wide. Try
    make your lines shorter than 80 characters

    R/authors_address.R:89:1
    R/authors_refine.R:3:1
    R/authors_refine.R:4:1
    R/authors_refine.R:6:1
    R/authors_refine.R:9:1
    ... and 71 more lines

I see that e.g. the first one is long but ok as is, in this case you could add # nolint at the end. See https://github.com/jimhester/lintr#project-configuration

  ✖ fix this R CMD check NOTE: Namespaces
    in Imports field not imported from: ‘Rdpack’
    ‘gdtools’ ‘mapproj’ ‘vdiffr’ All declared
    Imports should be used.

Can you fix DESCRIPTION?

embruna commented 4 years ago

Hi @maelle, I think we got all of these taken care of.

A few of the lines of code are at 81 characters, but I couldn't reduce them further because of the required formatting. I hope that's ok.
The description on the repo has been changed to eliminate the "update to refnet text"
Coverage 91% ! :)

maelle commented 4 years ago

Hi @embruna, thanks! Np reg the remaining long lines.

Reg DESCRIPTION could you have a look at dependencies ‘Rdpack’ ‘gdtools’ ‘mapproj’ ‘vdiffr’ as noted in my previous comment? Dependencies only used in tests should be under Suggests. by the way the line for Imports is very long, I'd suggest running desc::desc_normalize() to have one line per dependency (like what you have for Suggests), which would be easier to browse (that function also orders dependencies alphabetically).

Reg Travis what I meant is that I recommend undoing this commit https://github.com/embruna/refsplitr/commit/92e9fe5eff948f54ae9892526ddfdf83e322c461 since your package now is in very good shape, it'd be good for Travis to flag new WARNINGs as they appear.

After that we'll be done. Thanks for your patience and thanks again for your work!

embruna commented 4 years ago

OK, drum roll...

I moved the packages to the Suggests, and ran desc::desc_normalize() on the DESCRIPTION. The packages are also listed in alphabetical order.
travis.yml now has warnings_are_errors: true

We're the ones who are grateful, both for your patience and the attention to detail. The review of this package has been incredibly helpful and detailed....I wish my papers were reviewed this thoroughly!

maelle commented 4 years ago

Thank you! :grin: And thanks again to both reviewers!

I'm sorry but I now realize these packages are maybe not even used at all in your package, or am I missing something? I've searched for their name in the package code (e.g. https://github.com/embruna/refsplitr/search?q=mapproj&unscoped_q=mapproj&type=Code)

mapproj
Rdpack I suppose you used to use it for package docs, but now you no longer use it?
vdiffr I see https://github.com/embruna/refsplitr/blob/3ba804f62014aee01c370e5e14209f8c2817cf73/tests/figs/deps.txt but right now there doesn't seem to be any test using vdiffr, the tests in https://github.com/embruna/refsplitr/blob/master/tests/testthat/test_plots.R don't use it.

If the packages are really not used, could you please remove them from Suggests?

maelle commented 4 years ago

You don't need covr either since it's run on Travis. I've now found a way to find unneeded dependencies using the attachment package from inside the refsplitr folder.

desc_deps <- attachment::att_from_description()
scripts_deps <- attachment::att_from_rscripts(".")
namespace_deps <- attachment::att_from_namespace()
vignette_deps <- attachment::att_from_rmds()
desc_deps[!desc_deps %in% c(scripts_deps, namespace_deps, vignette_deps)]

maelle commented 4 years ago

Note that you could still have false positives since you could use Suggests packages for other things.

maelle commented 4 years ago

Happy New Year!

@embruna Once you've responded to my comment about Suggests unneeded dependencies the package will be approved. 🙂

embruna commented 4 years ago

Hi all, happy New Year.

Deleted the 4 packages from the Suggests....thanks @maelle!

embruna commented 4 years ago

Well, it was too good to be true...just got this email:

I'm going to try adding the four packages to suggests one at a time and seeing which of them actually do need to be there. I'll update shortly.

maelle commented 4 years ago

Ah indeed sorry mapproj seems needed https://travis-ci.org/embruna/refsplitr/jobs/633881379#L6325

embruna commented 4 years ago


6324
Error: processing vignette 'refsplitr-vignette.Rmd' failed with diagnostics:
6325
there is no package called 'mapproj' ```

embruna commented 4 years ago

Yes, I just reinserted to the Suggests. Fingers crossed!

ropensci / software-review

Submission: refnet #256

Summary

Requirements

Publication options

Detail