Closed aurielfournier closed 4 years ago
Thanks very much for your submission @aurielfournier - we're discussing now and will get back to you soon
Thanks for your submission @aurielfournier! I see the package doesn't have any test and doesn't have continuous integration yet. I suggest we put the submission on hold while you sort that, unless you can and want to add this within a week or so? There is some guidance in this guide, and I am happy to answer any question here or via Slack!
I was also looking at the dependencies, there are many of them in DESCRIPTION and
stringi
)@importFrom
(or even just pkg::fun
when calling the function) and @export
tags in R files. I wouldn't recommend importing whole packages unless needed. See http://r-pkgs.had.co.nz/namespace.html#imports @aurielfournier for info I've just added the holding label, please update this thread once you have had time to work on the package, and ask me any question.
Thanks @maelle I and my co-authors are working on it, but its taken a bit longer then we expected. Appreciate your patience!
No problem, and happy to help if I/we can!
@maelle package now has tests and continuous integration.
I removed stringi from the DESCRIPTION file.
I have fixed the issues from the NAMESPACE file.
Huge thanks to my coauthor @birderboone for doing the heavy lifting to get this over the finish line!
I think we are ready for review now. If you have any other things that need to be addressed let me know.
Thanks!
:wave: @aurielfournier @birderboone! Awesome, thanks to both of you! A few comments before I do the last editor checks:
utils
functions e.g. utils::flush.console()
dplyr
function in which case you can write e.g. mutate(df, y = .data$a + .data$x)
to make the NOTE disappear see this vignettertimicropem
I put the paper in a paper/ folder https://github.com/ropensci/rtimicropem/tree/master/paper and buildignored it usethis::use_build_ignore("paper/")
or something like that.usethis::use_coverage("codecov")
which will give you stuff to add to the Travis config file, browse codecov maybe (I can't remember) and give you the code to paste in the README to get a badge.Hi @maelle
moved the badge
Addressed the Travis warnings/notes
I closed the two open issues. thanks for pointing out the milestones, I had forgotten about that.
I added the CoC and contribution guides.
Thank you so much for all the links and tips on how to address these issues, it is greatly appreciated.
Build is now passing!!
Yay, green badge! Can you also add a coverage badge? Run usethis::use_coverage("codecov")
which will give you stuff to add to the Travis config file, browse codecov maybe (I can't remember) and give you the code to paste in the README to get a badge.
Done! Sorry I missed that.
Thank you! A few more things before I search for reviewers (then they and you have less work :wink:).
goodpractice
output✖ write short and simple functions. These
functions have high cyclomatic complexity:authors_clean
(68).
Maybe you can split it in several helper functions?
✖ omit "Date" in DESCRIPTION. It is not required
and it gets invalid quite often. A build date will be
added to the package when you perform `R CMD build` on it.
✖ add a "URL" field to DESCRIPTION. It helps users
find information about your package online. If your
package does not have a homepage, add an URL to GitHub, or
the CRAN package package page.
✖ add a "BugReports" field to DESCRIPTION, and
point it to a bug tracker. Many online code hosting
services provide bug trackers for free,
https://github.com, https://gitlab.com, etc.
Run usethis::use_github_links()
.
✖ avoid long code lines, it is bad for
readability. Also, many people prefer editor windows that
are about 80 characters wide. Try make your lines shorter
than 80 characters
R\authors_clean.R:24:1
R\authors_clean.R:26:1
R\authors_clean.R:29:1
R\authors_clean.R:35:1
R\authors_clean.R:36:1
... and 162 more lines
It's the complicated function, one more reason to try and simplify it?
✖ omit trailing semicolons from code lines. They
are not needed and most R coding standards forbid them
R\authors_refine.R:20:198
✖ avoid sapply(), it is not type safe. It might
return a vector, or a list, depending on the input data.
Consider using vapply() instead.
R\authors_clean.R:97:17
R\authors_clean.R:132:19
R\authors_clean.R:152:19
R\authors_clean.R:186:17
R\authors_clean.R:223:22
... and 14 more lines
✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...),
1:NROW(...) and 1:NCOL(...) expressions. They are error
prone and result 1:0 if the expression on the right hand
side is zero. Use seq_len() or seq_along() instead.
R\authors_clean.R:56:15
R\authors_clean.R:71:79
R\authors_clean.R:188:15
R\authors_clean.R:209:21
R\authors_clean.R:422:12
... and 9 more lines
✖ avoid 'T' and 'F', as they are just variables
which are set to the logicals 'TRUE' and 'FALSE' by
default, but are not reserved words and hence can be
overwritten by the user. Hence, one should always use
'TRUE' and 'FALSE' for the logicals.
R/authors_clean.R:NA:NA
R/authors_clean.R:NA:NA
R/authors_clean.R:NA:NA
R/authors_clean.R:NA:NA
R/authors_clean.R:NA:NA
... and 38 more lines
Can you please re-trigger a Travis build so that the coverage badge indicate a coverage? We're aiming at a minimal coverage of 75% see https://github.com/ropensci/dev_guide/pull/94/files (brand-new official guidance)
I'd recommend putting all badges on a single line.
Could you also use Appveyor CI for Windows? usethis::use_appveyor()
. This will add another badge.
You can add the in-review badge
[![](https://badges.ropensci.org/256_status.svg)](https://github.com/ropensci/onboarding/issues/256)
It'll turn green when your package is approved.
Could you please add examples in the documentation of the functions?
Running devtools::spell_check()
shows a few typos among the false negatives: querrying, nunmbers etc.
Reviewers: @njahn82 @bmkramer Due date: 2018-12-12
Hi @maelle
We are going to pause, and redo authors_clean()
to be simpler/broken down into several functions. This will probably take ~ 1 week.
Thanks!
Ok, thank you!
Alright! After some fighting with Travis the past 24 hours, we are good to go.
@birderboone split up authors_clean into three smaller internal functions, that should make review of the code easier. We've also addressed the other comments from @maelle
If I missed something, let me know.
Thanks!
Thanks @aurielfournier and @birderboone!
A few more things from goodpractice
to tackle before I look for reviewers
It is good practice to
✖ add a "BugReports"
field to DESCRIPTION, and point
it to a bug tracker. Many
online code hosting services
provide bug trackers for free,
https://github.com,
https://gitlab.com, etc.
Simply run usethis::use_github_links()
✖ use '<-' for
assignment instead of '='. '<-'
is the standard, and R users
and developers are used it and
it is easier to read your code
for them if you use '<-'.
tests\testthat\test_authors_match.R:4:5
The styler
package might help.
✖ avoid long code
lines, it is bad for
readability. Also, many people
prefer editor windows that are
about 80 characters wide. Try
make your lines shorter than 80
characters
R\authors_address.R:12:1
R\authors_address.R:14:1
R\authors_address.R:17:1
R\authors_address.R:20:1
R\authors_address.R:41:1
... and 174 more lines
✖ avoid sapply(), it is
not type safe. It might return
a vector, or a list, depending
on the input data. Consider
using vapply() instead.
R\plot_net_address.R:32:26
R\plot_net_address.R:33:26
tests\testthat\test_references_read.R:10:17
✖ avoid 1:length(...),
1:nrow(...), 1:ncol(...),
1:NROW(...) and 1:NCOL(...)
expressions. They are error
prone and result 1:0 if the
expression on the right hand
side is zero. Use seq_len() or
seq_along() instead.
R\authors_georef.R:55:25
R\authors_georef.R:71:15
R\authors_georef.R:113:17
R\plot_net_address.R:34:35
R\plot_net_address.R:123:22
... and 1 more lines
✖ fix this R CMD check
NOTE: Namespaces in Imports
field not imported from:
'Rdpack' 'maps' 'stringr' All
declared Imports should be
used.
✖ avoid 'T' and 'F', as
they are just variables which
are set to the logicals 'TRUE'
and 'FALSE' by default, but are
not reserved words and hence
can be overwritten by the user.
Hence, one should always use
'TRUE' and 'FALSE' for the
logicals.
R/authors_address.R:NA:NA
R/authors_address.R:NA:NA
R/authors_address.R:NA:NA
R/authors_georef.R:NA:NA
R/authors_georef.R:NA:NA
... and 15 more lines
And from me: could you please add a coverage badge? usethis::use_coverage()
should help you with that.
Thanks in advance and thanks for all your work until now! 😸
Hi @maelle
Thanks as always for your kind patience. It is greatly appreciated.
I've addressed all of the above, and I finally downloaded goodpractice
for myself to check things.
The one issue that I don't totally understand, but isn't throwing an issues in goodpractice
is this one
fix this R CMD check NOTE: Namespaces in Imports field not imported from: 'Rdpack' 'maps' 'stringr' All declared Imports should be used.
I thought that meant that I needed to remove Rdpack
, maps
and stringr
from the DESCRIPTION file. So I did, but then the build failed, and it did not pass unless I put Rdpack
back in.
But otherwise I think we're ok. :D
:wave: @njahn82 @bmkramer! Friendly reminder that your reviews are due in two days, on 2018-12-12. 😺
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 6 hours
This is a specific package used for manipulating and analyzing authorship data from the Web of Science (WoS), a large toll-access literature and citation database indexing articles from around 12.000 academic journals. The packages imports local files that needs to be manually downloaded from the database. This is a quite common workflow when re-using WoS data, because API access is very costly and limited.
I was very excited to see that refnet
addresses the problem of author disambiguation and affiliation extraction using WoS data. As a data analyst for scholarly communication at a research library, I sometimes create co-authorship networks. For this task, I often use Web of Science data. It is very laborious to parse the different text strings representing authors and institutions and to disambiguate them. I especially like that refnet
supports a workflow where automatic and manual cleaning steps are supported.
Unfortunately, I had a hard time to get started with the package, because it took me a while to find information about what WoS data export format was needed, and how to load the data into R using the package.
After downloading the data, my first attempts loading the file into R failed:
library(refnet)
my_data <- references_read(data = "wos_ropensci.txt")
## Error in references_read(data = "wos_ropensci.txt"): ERROR: The specified file or directory does not contain any
## Web of Knowledge or ISI Export Format records!
It took me a while (and many manual downloads from the WoS) to realize, that the param dir
needs to be set to FALSE
when I want to load just one file.
my_data <- references_read(data = "wos_ropensci.txt", dir = FALSE)
I feel that the average R user is not as patient when appropriate starting instructions are missing. My main request as reviewer would be therefore to improve high-level documentation, as well as to provide a sample dataset to play with.
I suggest expanding the README and to present an overview and some details in a refnet-package.Rd
file, which is currently missing, so that users can type ?refnet-package
for help.
Here are some other observations and suggestions that might helpful for improving the package.
Although the long-form documentation nicely explains the motivation and the workflow, it seems that the vignette does not process code chunks with functions from the package. I would suggest to add executable examples to successfully demonstrate to the users what can be done with the package. It would also be helpful to include an Rmarkdown file used to generate README.md
with at least one runnable example.
Installed easily, but it does not passed R CMD full check with --as-cran
. There were two Errors and two Notes:
Two Errors
Conflicting package names (submitted: refnet, existing: RefNet [https://bioconductor.org/packages/3.7/bioc])
https://www.bioconductor.org/packages/release/bioc/html/RefNet.html
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
| | 0%
|
|======================================================================| 100%[1] "Now processing all references files"
[1] "Now processing all references files"
|
| | 0%
|
|======================================================================| 100%══ testthat results ═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
OK: 66 SKIPPED: 0 FAILED: 2
1. Failure: Net plots work (@test_plots.R#95)
2. Failure: Net plots work (@test_plots.R#96)
Two Notes
* checking DESCRIPTION meta-information ... NOTE
Maintainer field differs from that derived from Authors@R
Maintainer: ‘Emilio Bruna <embruna@ufl.edu>’
Authors@R: ‘Emilio Bruna Developer <embruna@ufl.edu>’
* checking top-level files ... NOTE
Non-standard file/directory found at top level:
‘missing_addresses.csv’
As described in the Check logs, there is a conflicting Bioconductor package with the same name: RefNet. To comply with rOpenSci and CRAN, a new package name is needed. A name where also the database name is included could help users that want to work with Web of Science data files to discover the package.
The package has a test suite for main functions, which succeeds in RStudio, but not when checking the package bundle (see R CMD check output)
It seems that the Authors@R
documentation in the Writing R Extensions is misleading because in this package "Developer" is used as family name in the author names field as well, probably explaining why the author and maintainer field differs. Family names instead of "Developer" needs to be added to the Authors@R
vector.
missing_addresses.csv
needs to be passed to .RBuildignore
, or removed when not needed.
In the Documentation
, the brief "About" refers to Thomson Reuters as company behind the Web of Science. Ownership changed recently to Clarivate Analytics.
The package uses automatic testing, which is great. Tests could be expanded to cover more functionalities. For instance, authors_georef()
does not check geo-coding using Google Maps.
While testing data export functionalities, files are written into the testthat
folder. I would suggest to avoid this behaviour using unlink()
after the tests. Here's an example how to use unlink
from the rio package.
Main functions have many lines, which makes it very hard to follow what is going on. It would be great, if these functions could be split into smaller units.
references_read()
seems to contain a lot of repeated code to import data as data.frame. I wonder if the WoS csv
export file format could be used instead of the Plain Text
format? When the data is rectangular, the readr
package has great functionalities to strip out whitespace, which takes much room in the function, and to define colClasses
while loading files into R.
When importing data with references_read()
, values in many columns end with a line break \n
.
Some console messages are invoked by using the print()
method (see https://github.com/embruna/refnet/search?l=R&q=print+%2A.R). To enable user-friendly surpression, message()
and warning()
can be used instead.
There are various issues when checking the code syntax with lintr::lint()
that needs to adressed.
Documentation of functions can be improved by making more use of roxygen2
tags. Not all functions have examples. Internal functions should be tagged with @noRd
to avoid that they are added to the manual.
The use of functions from other packages could be made more explicit to the users. In many cases, it is not possible to interact with them.
authors_georef()
, for example, uses ggmap::geocode
to retrieve geo-coding information. Since a couple of weeks, however, keyless access to Google Maps Platform has been deprecated. Information about how to pass API keys to the function to make geocode work would be very helpful.
Functions used to visualize the networks make use ggplot2. It would be great to interact with its functionalities when calling the refnet
functions.
To improve documentation of external functions, helpful tags roxygen2
like @importFrom
and @inheritParams
should be considered.
Overall, it seems that package has quite a history, and I welcome updating it. However, because of the ambiguity of author names and addresses in general, and the complicated WoS data format in particular, I wonder if more focus would improve the maintainability of the package.
One strategy could be the usage of tidyverse packages and functions. At least, they would help to dry out code for loading the data and string manipulation in a tidy way. Of course, the package would have to start with importing rectangular data and not the field tags format, which is now used.
Another would be the focus on parsing and transforming the authorship data including affiliations stored in the C1
field. Developing functions used to visualize the networks, however, could be discontinued in favor of long-form documentations, and in favor of data formats supported by Social Network Analysis packages and software.
I think that's it from me! Happy to help further with the process!
Thanks a lot for your review @njahn82! 😺
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
- [x] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [x] A short summary describing the high-level functionality of the software
- [x] Authors: A list of authors with their affiliations
- [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Comments: DOIs missing from a number of references; 'Larivière' spelled as 'Lariviare' on two occasions; figure not displaying correctly
Estimated hours spent reviewing: 6 hours
General:
I really like the ability of this package to extract author and address information from WoS records. The main functionalities of the package worked (importing results, author name parsing and disambiguation and georeferencing) worked reasonably well for me. The visualization functions I could not get to work.
Information on functionality provided in vignette is detailed and complete; it would be good though to duplicate some of this (e.g. the examples) in Readme.md to get people started on how to use the package.
I have focused my review on functionality of the package. Comments below are based on performing all tasks as described in the vignette, with a custom test dataset downloaded from Web of Science (first 500 articles from PeerJ in 2018)
1 Introduction In describing the package, it is mentioned that the processed data-sets can be exported in tidy formats for more in-depth analyses with other packaged. It is not mentioned in what format export is useful for other packages. Perhaps this is self-explanatory (the csv-outputs provided), but it would be good to specify.
2.0 Using Refnet
2.1 Importing Search Results
In references_read(), what is the default for dir=T/F? args(references_read) says it's TRUE
Comments on vignette text: -- typo in example a): .txr -- c) is not a separate example -- remark in Appendix 2 on fields only included when all.fiels=T is included in references_read() should be included in main text describing references_read()
testing all.fields=T results in error: unused argument (all.fields=T) args(references_read) reveals it should be include_all=FALSE
2.2 Author address parsing and name disambiguation
Function authors_clean(): no csv file saved. Function contains argument write_out_data = FALSE. Tried TRUE => 2 files saved (authors_review, authors_prelim)
Function also contains argument sim_score (value 0.88) - this is not explained in the documentation (it is mentioned for authors_refine where it has a NULL value)
In documentation under 2.2.2, -- reference is made to Appendix 2, this should be Appendix 3. -- it is stated: 'Users that prefer to manually review the results of the disambiguation can do so with the “authors” object and .csv files' -> unclear which of the 2 csv files (prelim or review) should be taken (I assume review from the documentation of the next step. Also: 'authors' object is unclear)
"Corrections made to the “review” file are merged into the “preview” file"-> should be "prelim"
In Appendix 3: -- explanation on author name disambiguationis informative and useful. It does have some spelling and style issues, not critical, but could do with a careful edit -- kudos for encouraging authors to sign up for (and use!) ORCID! -- not covered: sim_score -- layout of table 2 (2 columns) is mangled -- In table 2, similarity is listed as NA, but it has a value in my test data
2.3 Georeferencing author institutions
Example for authors_georef is incomplete example_georef <-authors_georef(------,------,-----)
Explanations of arguments is incomplete: function (data, address_column = "address", filename_root = "", write_out_missing = TRUE, retry_limit = 10)
-> address_column has value -> retry limit not discussed
excuting authors_georef() resulted in lots of error messages, with a loop at the end and a message that a number of geocoding queries are remaining -> if this is expected behaviour, would be good to address in documentation
In documentation: not clear when which geocoding application is used when (sequentially?). http://www.datasciencetoolkit.org/ and/or https://developers.google.com/maps/documentation/.
In documentation, it is stated 'an output/file of references that refnet was unable to georeference, which the user can review, manually correct, and import back into the file of georeferenced author locations -> file seems to contain all lines (with and without lat/long resolved) -> unclear how 'import back into file' should be performed
2.4. Data Visualization: Productivity and Collaboration
I could not get these to work, see error messages (and some analysis on them) below.
Error in plot_addresses_country(PeerJ_2018_2_georef, filename_root = "./output/PeerJ_2018_2") : unused argument (filename_root = "./output/PeerJ_2018_2")
args(plot_addresses_country) function (data, mapRegion = "world") -> so no argument for filename_root as in documentation
Error in rworldmap::joinCountryData2Map(country_name_table, joinCode = "NAME", : your chosen nameJoinColumn :'country_name' seems not to exist in your data, columns = Freq
plot_addresses_points: also no argument for filename_root as in documentation
Example from plot_net_coauthor() is incorrect: plot_addresses_points <- plot_addresses_points(data, filename_root="./output/example")
plot_net_coauthor_2 <- plot_net_coauthor(PeerJ_2018_2_georef) Error in data[!is.na(data$country), ] : incorrect number of dimensions
plot_net_country_2 <- plot_net_country(PeerJ_2018_2_georef) Error in data[!is.na(data$country), ] : incorrect number of dimensions
args(plot_net_country) function (data, line_resolution = 10, mapRegion = "world") -> so no argument for filename_root as in documentation
Error in plot_net_addresses(PeerJ_2018_2_georef) : could not find function "plot_net_addresses"
With apologies for the late review!
Thanks a lot for your review @bmkramer! :smile_cat:
Reg "The package conforms to the rOpenSci packaging guidelines" the question is whether you see any discrepancy between https://ropensci.github.io/dev_guide/building.html and the package, if you have time you are qualified to assess, and you can ask me any question.
Was time or another problem the reason for not running tests? Happy to help if needed (well I can't help with time :smile: ).
Thanks a lot for your feedback in any case!
@aurielfournier @birderboone now both reviews are in! :tada:
Just a note to all involved that @birderboone and I are working on the edits (huge thanks to the reviewers!), we're just a bit slowed down by other things at the moment, but we should have everything addressed by February 5th. Thanks for your patience!
Thanks to the reviewers (@bmkramer , @njahn82) to providing several useful links to resource that made addressing their comments much easier! Your comments were very helpful and constructive and the package is much better off for them, we really appreciate your time!
First, if you look at the repo, you will see that build is failing. This is because all of this happened, and basically if we were able to get travis to use the github version of ggmap, everything would be fine, but until those changes are on CRAN, the travis build will fail. Since our response was due today, and other then this issue we're ready for you all to look at it again I'm tossing the ball over to your side of the court. If you would like to wait till ggmap on CRAN is updated, and the build passes, that is fine by us.
Below is our response broadly to the reviewer comments, if you would prefer a comment by comment response, let me know and I'm happy to do that. Thanks!
Auriel, Matt and Emilio
~~
We are choosing at this time to not split up our functions anymore then they already are. We have split up the original functions into smaller pieces two times already.
While we appreciate the suggestion for using tidyverse functions, and we do use them in many other contexts, due to the changes in the tidyverse packages, that are not always backwards compatible, we have chosen to avoid them in many cases to avoid this package breaking because of that in the future.
We have changed the name of the package to refsplitr to avoid the conflict on CRAN
We have changed references_read to have dir default to FALSE, to help alleviate the issues the reviewer had
We removed all the csv writing outputs for functions where that was not needed as apart of the author refining process.
All typos and other small changes have been made, thank you to both reviewers for catching them
We have removed any need for the google API from the package, since between when we submitted and now it can no longer be used for free.
We have fixed the plotting functions to the best of our ability, some we were unable to replicate. If the reviewers find them again, can they share their input file so we can better diagnose the issue?
The reviewer is correct in that ciw formats can also be processed, and we have revised the text of the vignette to reflect that. We were initially reluctant to mention this because we wanted to avoid users download files in proprietary formats, but ciw files can be opened with a text editor). We have also edited the Appendix showing how to download search results to include direct download of ciw files from the search results without going to marked list.
Reviewer comment : In my tests, export does not need to be via Marked list, but also works via download menu in search results, either as Endnote export (.ciw) or as 'Other file formats', with 'full record' and 'plain text' selected. This works much faster than via Marked lists.
Response: This is indeed faster because it eliminates several steps. However, this will download all records resulting from a search, including any that were incorrectly returned (e.g., those by an author with an identical name). If users wish to filtering results prior to download to avoid including unwanted publications, then the best approach to save only the desired records to the Marked List and download from there as either a .ciw or .txt. Appendix 1 has been amended to include this option.
:wave: @aurielfournier @birderboone! Thanks for your answer.
if we were able to get travis to use the github version of ggmap
You can do that! See https://docs.travis-ci.com/user/languages/r/#remote-package :-)
@njahn82 @bmkramer thanks again for your reviews. Are you happy with the authors' response above?
Thanks for the link @maelle
I've added in the needed argument to the travis yml file, and the build still isn't working, though all the tests pass on my machine when I use the github version of ggmap, though it did take restarting everything to make that happen
So I'm not sure what is going on. :/
Ok! I worked with some of the awesome ladies over in R-Ladies today, and we figured out the issue.
Its actually an issue with ggmap
. Jenny Bryan opened up an issue in their repo about it. .
The solution that Jenny suggested was adding
options(ggmap = list(display_api_key = FALSE))
at the top of authors_georef.R
and now the build is passing. 🎉
Awesome, well done you and Jenny!
If you're going to set the option in this way, which seems reasonable for a semi-temporary workaround, you should technically be a bit more careful to put things back the way you found them. You only want your value of FALSE
to hold for the duration of this function's execution.
At the place where you set the option, you could capture the existing value and immediately use on.exit()
to schedule its restoration. Or you could use withr::local_options()
to accomplish both at once, with the downside that you'd need to Import withr.
That problem should be fixed from ggmap's side now (with https://github.com/dkahle/ggmap/commit/0c68d5c); let me know if that doesn't do it. Sorry for the problem!
@njahn82 @bmkramer thanks again for your reviews. Are you happy with the authors' response above?
Sorry for my late reply. First of all thank you for your kind words and the changes you made. I am particularly impressed about your engagement with the R community to improve your work.
Before addressing the changes made, I wonder if I missed that runnable R code chunks were added to the README or vignette. As far as I see the vignette does not execute functions from the package, and there is no README.Rmd
file. I am afraid it is formal requirement from rOpenSi that the vignette demonstrates that major functionality from the package runs successfully. As a user, I often look for such runnable examples before getting started with a package.
Can you point me to such document?
Hi @njahn82 .
Perhaps I am misunderstanding the question, but refsplitr/vignettes/refsplitr-vignette.Rmd
contains chunks of code that can be run by the user which execute each function from the package. Which is a change we made in this last revision.
for example: line 66
example_refs <- references_read(data = system.file("extdata", "example_data.txt", package = "refsplitr"),
dir = FALSE)
Is this not what you meant?
I also just updated the ReadMe file to have a the same example shown in the vignette.
Sorry for the confusion. I thought of R code chunks indicated by curly brackets (```{r}
) that are evaluated when a R Markdown document is rendered. The resulting output file shows the R output. In the vignette, it seems that package functions are highlighted (```r
). When rendered, no R output is presented, but screenshots from spreadsheet software.
Example: https://github.com/embruna/refsplitr/blob/16e7308fe75044e53848ab3bbecb80abb3cb7264/vignettes/refsplitr-vignette.Rmd#L99-L110
It would be be great to have some reproducible examples for the package's main functionalities.
Agreed! We'll get right on it. Ggmap is giving us some issues again, but once we get those resolved we'll make those edits to the vignette and report back.
Thanks for clarifying!
Alright. We resolved the issue with ggmap. A vignette with rendered R output is now in the repo! Let me know if you have any other comments!
:wave: @bmkramer @njahn82, are you both happy with the authors' response?
Unfortunately, I feel that some improvement is still needed.
It's great to have reproducible examples now in the vignettes. Sadly, I did not succeed building the vignette while installing the package.
So I used the rendered refsplitr-vignette.html
file instead: When describing plot_net_country()
and plot_net_address()
, it would be better to call the $plot
element directly to avoid that the other list elements are printed out. It would be great to have an example how users can generate and customize their own plots using the other outputs provided by these functions.
README.Rmd
needs to be added to the .Rbuildignore
file to make the package more CRAN compatible. If it is intended to submit the package to CRAN, dependencies listed in Remotes
must be available via CRAN. Otherwise, there will be a warning when running R CMD check
.
As noted, ownership change of the Web of Science needs to be addressed; since 2016 the Web of Science has been provided by Clarivate Analytics, and not Thomson Reuters.
I also noted that functions could be more thoroughly documented. All functions lack @examples
tags followed by example R code on how to use the function. See also https://ropensci.github.io/dev_guide/building.html#examples
Source code should adhere to a code style, especially spacing, to improve the readability of the source code https://ropensci.github.io/dev_guide/building.html#code-style . Practice checks
goodpractice::gp()
and lintr::lint()
help checking for good coding style.
Regarding the use of tidyverse, I am fine with not using it. However, as this package already makes heavy use of external packages including those from the tidyverse, I thought that it would make the programming of the package more coherent.
Lastly, while playing around with the plotting function plot_net_address()
, I wondered if you want to support transparent edges by default. Then, overlapping edges would become more visible. Here's an example:
Default:
With alpha transparency set to 0.1
I also realized that ggplot2::aes_string
is used, which is soft-deprecated. It is recommended to use tidy evaluation idioms instead. Would it be possible to update the ggplot2 functions accordingly?
Thanks @njahn82 for these useful review comments! @aurielfournier @birderboone could you please address those?
Hi, Thanks @njahn82 for the comments. We'll get them addressed, it may be a bit delayed though as I'm on day 1 of two straight weeks of all day courses, but hopefully by the end of the month.
Hi All, Matt and I are working on this, but its likely going to be mid May before we have everything pulled together. We apologize for the delay, and thank you for your patience, we're both doing this outside of our day jobs.
:wave: @aurielfournier & @birderboone! Thanks for the update, I understand.
:wave: @aurielfournier & @birderboone! Mid-May is now, any update? :wink:
Hi @maelle :D we (and by we I mean mostly @birderboone ) are working on it! Its close to being done, we should have stuff for you all by the end of the month. Thanks for your patience!
Hello, So the package should be ready.
Thank you for your patience
Thanks @birderboone!
Regarding naming, the most important thing is to be consistent within the package, to make it easier for e.g. new contributors to pick things up.
@njahn82 does the response by the authors above address your concerns? Thanks in advance!
Thank you! I lack time to look into the changing of the internal structure of the package. However, there are still some issues with the vignette. To speed things up, I sent an pull request, which addresses the following:
("/home/matt/r_programs/refsplitr")
In the vignette, there is a non-runnable code chunk, which fails when executed: https://github.com/embruna/refsplitr/blob/master/vignettes/refsplitr-vignette.Rmd#L192
Is it possible to support transparency in plot_net_address()
as well?
I saw that aes_string()
was changed to aes_
. Unfortunately, this function will be soft-deprecated in the near future as well. It is recommended to use tidy evaluation idioms instead.
There are three warnings and two notes when the package is built using travis.
I do not want to be picky, but feel that fixing these issues will help to build trust in the functionalities of this useful package.
thanks a lot @njahn82! :smile_cat:
Summary
What does this package do? (explain in 50 words or less):
refnet
is a package to read, organize, geocode, analyze, and visualize Clarivate Web of Knowledge/Web of Science, format reference data files for scientometric, social network, and Science of Science analyses.Paste the full DESCRIPTION file inside a code block below:
URL for the package (the development repository, not a stylized html page): https://github.com/embruna/refnet
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
Data extraction and munging, since it takes data from one format, and transforms it into something that is useful, and also matches up records among authors.
[Note, the link for the package fit, does not lead to that page anymore, and I couldn't find anything about package fit in the linked policies]
Scientists interested in studying the networks of a particular author, subject area or journal.
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
https://github.com/ropensci/onboarding/issues/247
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Detail
[yes] Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:[yes] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Heather Piwowar @hpiwowar