Closed whedon closed 5 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @briatte, it looks like you're currently assigned as the reviewer for this paper :tada:.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
Attempting PDF compilation. Reticulating splines etc...
Comment # 1 [replicating vignette UTDEventData.html
]
Quick note re: API registration at http://eventdata.utdallas.edu/signup
The UTD server does not check for email address validity and will return an error (500) if provided an invalid email address.
That's not something that can be blamed on the package, of course, but the UTD server admins might benefit from improving their API registration form.
Comment # 2 [replicating vignette UTDEventData.html
]
The error message shown when the API key is wrong could be improved:
DataTables(api_key="foobar")
[1] "{\"STATUS\": \"ERROR\", \"DATA\":\"(<TYPE 'EXCEPTIONS.VALUEERROR'>, VALUEERROR('INVALID API KEY',), <TRACEBACK OBJECT AT 0X7F5ACD310488>)\"}"
Comment # 3 [replicating vignette UTDEventData.html
]
Is it deliberate that variable names all start with a space? Example below. This will be confusing for most users.
r$> tableVar(k, "Phoenix_rt")
[1] " code" " src_actor" " month"
[4] " tgt_actor" " country_code" " year"
[7] " date8_val" " id" " source"
[10] " date8" " src_agent" " latitude"
[13] " src_other_agent" " geoname" " quad_class"
[16] " source_text" " root_code" " tgt_other_agent"
[19] " day" " target" " goldstein"
[22] " tgt_agent" " longitude" " url"
[25] " _id"
Typo in vignette:
but
"cline_phenix"
will return noting.
Comment # 4 [replicating vignette UTDEventData.html
]
The orList
constructor asks for a list:
# a boolean logic, or, with the two query blocks
or_query <- orList(list(ctr, time))
It will throw an error if the list is provided in nonstandard-evaluation style:
r$> orList(ctr, time)
Error in orList(ctr, time) : unused argument (time)
Perhaps it would help the user if the function would accept objects directly:
function (...)
{
return(list(`$or` = list(...))
}
Same goes for similar query constructors.
Comment # 5 [replicating vignette UTDEventData.html
]
Thinking about the API key, it is customary to allow its storage as an environment variable (see ?options
). Is that possible with this package? The vignette does not mention it.
Comment # 6 [replicating vignette UTDEventData.html
]
The vignette is very helpful (and the authors do a great job at documenting possible errors/traps, e.g. Windows memory issues), yet I believe it would benefit from being broken down in more digestible chunks, such as:
Comment # 7 [replicating vignette UTDEventData.html
]
I'm done replicating the vignette.
Thinking about the package and data more globally, I think I'd appreciate a mention, somewhere in the docs, of how the data relate to (and whether it can be articulated with) similar event data and related event nomenclatures, e.g. those that Phil Schrodt has worked (or is working) on.
Apologies for not being more specific here, as I have limited experience with event data -- if this comment is too vague to be addressed, I'll inquire a bit and reformulate.
Comment # 8 [reviewing paper.bib
]
Re: this review point,
References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
It seems to me that IEEE papers always come with a DOI. Those are not in the bibliography. Same goes for the Schrodt paper.
As for the Althaus et al. dataset (BibTeX reference cline
), is that kind of data that could get distributed via e.g. Zenodo (and get a DOI from there)?
Comment # 9 [checking my unchecked review points]
Functionality: Have the functional claims of the software been confirmed?
Yup. Checking off.
Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)
I do not think this really applies here. Performance in this package boils down (mostly) to API query speed. While replicating the vignette, I found some queries to be slow-ish, but given the size of the data returned by some examples (250,000+ obs.), I'm fine with it.
Checking off.
Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
Also checking off this one, given that the package cannot really be tested against the API without providing an API key.
Some very basic unit tests could be imagined for small parts of the code: perhaps the authors will want to introduce a few control flow checks (e.g. making sure the argument to andList
is a list).
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Also checking off this one, as I'm not sure the authors really expect community contributions. The README includes everything else needed to email e.g. bugs or questions.
That's pretty much it for me: over to @andrewheiss.
Last, for the editor (@alexhanna), we need:
@whedon check references (see comment # 8 above for why)
@whedon check references
Attempting to check references...
OK DOIs
- None
MISSING DOIs
- https://doi.org/10.1109/isi.2016.7745457 may be missing for title: Near real-time atrocity event coding
- https://doi.org/10.1109/iri.2018.00065 may be missing for title: TwoRavens for Event Data
- https://doi.org/10.1080/03050629.2012.697430 may be missing for title: Precedents, progress, and prospects in political event data
- https://doi.org/10.1109/bigdata.2017.8258256 may be missing for title: Adaptive scalable pipelines for political event data generation
INVALID DOIs
- None
Dear @KateHyoung (I hope it's fine to address submitting authors in JOSS review threads?)
I posted a bunch of numbered comments in the thread above. I hope that you will find them useful in improving your package, which I found helpful and carefully coded. Thanks for your work, looking forward to discussing things further.
All the best~
Thank you @briatte for taking your time to review our R package and documents. Your comments are very helpful to improve our package. Regarding comment #1, I will contact the server manager at UTD to improve security. For others, I will try my best to reflect your comments on codes and vignette and will discuss if I have questions on your comments. Thank you again!
Dear @briatte and other reviewers
I have made some changes corresponding to the comments from @briatte.
DataTables("fooboo") [1] "This API key is invalid. Please check that you have entered your key correctly."
Ref. https://github.com/KateHyoung/UTDEventData/commit/a469ff91d028897a2e7dd6804e434fb64ab90cc5
Response for the comment #3 I have fixed the function of tableVar() that now prints a variable list without a space. Along with the change of this function, returnRexExp() was also fixed to provide appropriate query elements for API syntax. https://github.com/KateHyoung/UTDEventData/commit/7f76c78f9c618a225acc5e8dbc12117732dd5a1d#diff-6134badaa82d09a6bcf0dffd6b2f97b4
tableVar(k, "Phoenix_rt") [1] "code" "src_actor" "month" "tgt_actor"
Response for the comment #4 I am thinking which way is better for users. If I make it to accept country objects directly, users should provide it as c(“ctr”, “time”) in order to generate a proper query syntax. I feel that listing query elements with list() is no big difference with accepting the query element with c(). But if I have a better idea for it. I will update it.
Response for the comment #8 I have updated the DOIs of the reference in a paper. Ref. https://github.com/KateHyoung/UTDEventData/commit/a64c03691cc8dd50e5b36cd420e85c161cba4328#diff-68dc399b92868c40cec2beeb0a59b48e https://github.com/KateHyoung/UTDEventData/commit/9f0eea2b4fd6f6a6717670349e010fadfa1274d3#diff-68dc399b92868c40cec2beeb0a59b48e
I am working on editing the vignette according to the comments on it. Once I have done for this work I will post it soon. Thank you again, Kate
Thanks so much for the comprehensive comments, @briatte, and your responses @KateHyoung.
@andrewheiss have you had a chance to look through the submission?
Hi! Yep, I've been traveling way too much and have had unreliable internet access, but I should be able to do the formal review tomorrow. Sorry for the delay!
UTDEventData is a fantastic new package that provides easy access to UT Dallas's event data server. This is an important package, given the complexities involved in pulling event data from half a dozen different sources, and I've already started to use it in my own research. The package is easy to install and use, and I only came across a few issues, noted below. I've opened issues for a few of these at the package repository.
The README is fairly sparse and doesn't include any details on how to use the package (beyond providing a list of possible functions). The vignette currently contains complete examples of how to use the package, but reading those examples outside of R (i.e. through GitHub) is trickier since GitHub doesn't display HTML.
The README should include instructions about how to set up and use the API key, either as a local variable (e.g. k <- "blah"
) or as a system variable (e.g. adding Sys.setenv(APIKEY = "blah")
to .Rprofile
).
I have not really ever come across instructions to build vignettes when building packages with devtools, even when a package has vignettes (see gganimate, for example), and most packages I've seen on GitHub don't include two different options for package installation (except when one version is on CRAN and one is on GitHub). Additionally, nothing in the vignette is actually run when the package is built, since eval=FALSE
is enabled on all the vignette chunks. This makes sense to prevent overloading the event data server, and it makes it so users don't need an API key to install the package, but if nothing in the vignette actually runs, it might be easier to include lots of the documentation and examples in the README so users (1) don't have to explicitly build vignettes to see how the package works, and (2) users can follow rendered HTML online through GitHub.
But there's also utility in actually using vignettes so that users can run vignette("UTDEventData")
within R to see documentation.
There's no right answer here. I don't have any official suggestion for README vs. vignettes, other than the fact that the README should at least include some quick "getting started right away" examples.
In a similar vein, the practical examples at the end of the vignette are extremely helpful, but they're buried at the bottom of the long vignette. Perhaps consider splitting the vignette into two parts—one highlighting the different ways of using the different functions, and one with the examples so that users can jump to actual usage faster.
Allowing users to store the API key in an environment variable is convenient. It might be worth changing the name of the system variable. Right now it is APIKEY
, but it's hard to distinguish that from other system-level variables users might be using. Adding a prefix like UTD_APIKEY
or UTD_API
would help with recognizability.
Additionally, it might be useful (someday later, perhaps—not necessarily for this release) to have the system-level API key a default option in functions like DataTables()
or tableVar()
so that users can run tableVar("Phoenix_rt")
without needing to pass the api_key
argument every time. Other functions like devtools::install_github()
do something similar—if there's a GITHUB_PAT
system variable, it uses it, otherwise it doesn't.
The biggest issue is that the S4 class doesn't seem to be working correctly right now. When I run obj <- Table$new()
, I get the following error:
# > obj <- Table$new()
# Error: object 'Table' not found
As such, I'm unable to test the rest of the obj$*
commands. (see https://github.com/KateHyoung/UTDEventData/issues/10)
I see the utility of building the API key into the Table
objects, but the documentation doesn't explain why this method is preferable to the other more documented approaches to extracting data. The vignette mentions that using Table
objects simplifies API calls, but that it is limited to just DataTables()
, tableVar()
, and pullData()
. What should users do if they want to use other functions like sendQuery()
? Is there value in maintaining two different ways of accessing data (especially when one method is incomplete)? (I'm assuming there is value—it might be helpful to explain the difference in approaches).
The "Data request function" section doesn't reproduce and there's minimal explanation of what query_block
should look like. It says to pass "a list of queries created by andList()
, orList()
, or a single query block in the function as shown in the following example code," but the documentation hasn't covered andList()
and orList()
yet, and there's no working single query block in the example code. For now, running the example sendQuery()
as it stands in the vignette produces an error:
myData <- sendQuery(k,"icews", query_block, citation = TRUE)
# Error in sendQuery(k, "icews", query_block, citation = TRUE) :
# object 'query_block' not found
It might be helpful to include code for creating an example query_block
object that users can use before delving into all the query examples that follow.
There's a comment in the first example that "# the data set d1 and d2 are identical
", but there is no d1
or d2
. I'm guessing that these are now dt1
and dt
, but maybe they should be renamed?
The call to xtable()
in example 2 assumes that the user has loaded the xtable library already, but that never happens explicitly in the vignette. It might be helpful to prefix the function with xtable::xtable()
. Alternatively, since you're using rmarkdown to generate the vignette, consider using pander::pandoc.table()
or knitr::kable()
to generate a Markdown table instead of raw TeX.
The vignette is generally easy to follow and is a helpful guide to using the package.
There are several typos and grammatical errors in the vignette. I've corrected some of the typos in https://github.com/KateHyoung/UTDEventData/pull/9 - a quick round of editing should clean up the rest.
There are no formal tests, likely because the package is heavily reliant on remote access to an API and it would be computationally costly to hit the server for each test. Creating a cache of data locally might be helpful for testing, or perhaps adding tests for non-server-based functions like returnLatLon()
or returnDyad()
could be helpful (this isn't necessary, though)
It might be helpful to have some community guidelines in the package too, such as a CONTRIBUTING.md file (perhaps modeled after something like this or this) and a CONDUCT.md file (like this or this)
Phew. That's all I've got. I'm excited for this package!
Dear reviewers,
Thank you for your helpful comments for our package. I thoughtfully went over every comment and tried to reflect all of them because those make sense and enhance the package’s operations.
For the @briatte ‘s comment #4, we have changed the function code as you suggested. Users now can throw objects directly rather than list up with list()
.
Ref. https://github.com/KateHyoung/UTDEventData/commit/87f42817ba1e17734cffea4559130c734738ec5f#diff-079497ecdda4a05bbcdba43d0d5cacac
For the @briatte ‘s comment #7, I have added paragraphs to describe how this package is linked to the other previous political event studies in the introduction part of the vignette. https://github.com/KateHyoung/UTDEventData/commit/e41ee52f4d24d79435d57091ea64d34495135a17#diff-a28523db019dca3cb3eaab6cda57fe4d
@andrewheiss, thank you for the idea of handling API access in the package. I have learned lots of things from the information you provided me. I have incorporated the way to set an API key in some functions as an environment variable. See the updates in DataTables()
, tableVar()
, and prevewData()
(Please see README and vignette files). I have tried to apply the method to apply to other functions that require an API key and noticed that some functions created minor issues because of the argument setting as breaking a function. I need more time to make them stable working without errors, so I leave it for later works and will complete it during a summer break. I hope this work is good enough for this version.
Also, I have added a prefix to api_key
variable as to be utd_api_key
.
Ref. https://github.com/KateHyoung/UTDEventData/commit/4db04487d88a66e43fabe2ec67c478413bc5f04f
For the comments in Functionality issues, I have fixed the code so that Table
is working well. Ref. https://github.com/KateHyoung/UTDEventData/commit/159b807815bc36bf34a74baeeef4b39e189c6c5a#diff-ada2ba640dbb79b8d3a115369830f718
> obj<-Table$new()
> obj$setAPIKey("utd_api_key")
> obj$DataTables()
[1] "'CLINE_PHOENIX_SWB', 'CLINE_PHOENIX_NYT', 'ICEWS', 'PHOENIX_RT', 'CLINE_PHOENIX_FBIS', 'TERRIER'"
I have changed/deleted some codes not completely works in R code chunks in the vignette. And the xtable()
code is also changed to the knitr::kable()
function.
You can find a comment about community guidelines in README.md and CONDUCT.md in the file list.
Ref: https://github.com/KateHyoung/UTDEventData/commit/432c713fdd13a6938c978496de54b157f245f5f0#diff-46b8dc461fb4104140b72e4043aaeefe
README was restructured and now includes more information for users. Please see here. If you think it needs more information, please let me know.
Package’s vignette is edited and restructured according to the comments of @briatte and @andrewheiss. Ref. https://github.com/KateHyoung/UTDEventData/blob/master/vignettes/UTDEventData.Rmd
I am trying to list up all my changes from the reviewer’s comments and hope not to miss any important feedback. If I have, please feel free let me know.
This entire review process was very useful to make the package stable and functional. I sincerely appreciate your time and comments for this review.
Best regards, Kate
@KateHyoung: thanks so much for incorporating these edits. @andrewheiss and @briatte, thanks for the comprehensive comments. Can you two review the changes on the updated package and let me know if you're willing to sign off on the package for meeting the JOSS criteria for acceptance?
These changes look great! I approve and sign off. ✅
The changes look very thorough indeed, and the updated package loads fine, as does the vignette: I'm signing off too.
@whedon accept
No archive DOI set. Exiting...
You'll need to get a DOI for your repository, @KateHyoung. You'll need to create an archive (on Zenodo, figshare, or somewhere else, if you haven't already) and post the archive DOI here.
Hello @alexhanna,
Here is the archive DOI. https://zenodo.org/badge/latestdoi/113074713
Thank you for kindly letting me know the information.
@whedon set 10.5281/zenodo.2648643 as archive
OK. 10.5281/zenodo.2648643 is the archive.
@whedon accept
Attempting dry run of processing paper acceptance...
Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/635
If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/635, then you can now move forward with accepting the submission by compiling again with the flag deposit=true
e.g.
@whedon accept deposit=true
OK DOIs
- 10.1109/isi.2016.7745457 is OK
- 10.7910/DVN/28075 is OK
- 10.1109/iri.2018.00065 is OK
- 10.1080/03050629.2012.697430 is OK
- 10.1109/bigdata.2017.8258256 is OK
MISSING DOIs
- None
INVALID DOIs
- None
@whedon accept deposit=true
I'm sorry @alexhanna, I'm afraid I can't do that. That's something only editor-in-chiefs are allowed to do.
@kyleniemeyer and @arfon, this is ready for deposit.
Hi - in the future, please notify @openjournals/joss-eics to get the AEiC on duty (me this week)
@KateHyoung - before we accept this, please fix the references in the paper.
Once you've done this, please check the resulting pdf by entering @whedon generate pdf
in a new comment here.
Then we'll be ready to accept.
👋 @briatte & @andrewheiss - thanks for your reviews! And @alexhanna, thanks for your editing!
Attempting PDF compilation. Reticulating splines etc...
Attempting PDF compilation. Reticulating splines etc...
Attempting PDF compilation. Reticulating splines etc...
PDF failed to compile for issue #1322 with the following error:
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 13 0 13 0 0 81 0 --:--:-- --:--:-- --:--:-- 81 Error reading bibliography ./paper.bib (line 18, column 3): unexpected "u" expecting space, ",", white space or "}" Error running filter pandoc-citeproc: Filter returned error status 1 Looks like we failed to compile the PDF
Attempting PDF compilation. Reticulating splines etc...
Submitting author: @KateHyoung (Hyoungah Kim) Repository: https://github.com/KateHyoung/UTDEventData Version: v1.0.0 Editor: @alexhanna Reviewer: @briatte, @andrewheiss Archive: 10.5281/zenodo.2648643
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@briatte & @andrewheiss, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @alexhanna know.
✨ Please try and complete your review in the next two weeks ✨
Review checklist for @briatte
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
paper.md
file include a list of authors with their affiliations?Review checklist for @andrewheiss
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
paper.md
file include a list of authors with their affiliations?