openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
720 stars 38 forks source link

[REVIEW]: UTDEventData: An R package to access political event data #1322

Closed whedon closed 5 years ago

whedon commented 5 years ago

Submitting author: @KateHyoung (Hyoungah Kim) Repository: Version: v1.0.0 Editor: @alexhanna Reviewer: @briatte, @andrewheiss Archive: 10.5281/zenodo.2648643



Status badge code:

HTML: <a href=""><img src=""></a>
Markdown: [![status](](

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@briatte & @andrewheiss, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL:

The reviewer guidelines are available here: Any questions/concerns please let @alexhanna know.

Please try and complete your review in the next two weeks

Review checklist for @briatte

Conflict of interest

Code of Conduct

General checks



Software paper

Review checklist for @andrewheiss

Conflict of interest

Code of Conduct

General checks



Software paper

whedon commented 5 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @briatte, it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this ( repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching'


  1. You may also like to change your default settings for this watching repositories in your GitHub profile here:


For a list of things I can do to help you, just type:

@whedon commands
whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

briatte commented 5 years ago

Comment # 1 [replicating vignette UTDEventData.html]

Quick note re: API registration at

The UTD server does not check for email address validity and will return an error (500) if provided an invalid email address.

That's not something that can be blamed on the package, of course, but the UTD server admins might benefit from improving their API registration form.

briatte commented 5 years ago

Comment # 2 [replicating vignette UTDEventData.html]

The error message shown when the API key is wrong could be improved:

briatte commented 5 years ago

Comment # 3 [replicating vignette UTDEventData.html]

Is it deliberate that variable names all start with a space? Example below. This will be confusing for most users.

r$> tableVar(k, "Phoenix_rt")
 [1] " code"            " src_actor"       " month"
 [4] " tgt_actor"       " country_code"    " year"
 [7] " date8_val"       " id"              " source"
[10] " date8"           " src_agent"       " latitude"
[13] " src_other_agent" " geoname"         " quad_class"
[16] " source_text"     " root_code"       " tgt_other_agent"
[19] " day"             " target"          " goldstein"
[22] " tgt_agent"       " longitude"       " url"
[25] " _id"
briatte commented 5 years ago

Typo in vignette:

but "cline_phenix" will return noting.

briatte commented 5 years ago

Comment # 4 [replicating vignette UTDEventData.html]

The orList constructor asks for a list:

# a boolean logic, or, with the two query blocks
or_query <- orList(list(ctr, time))

It will throw an error if the list is provided in nonstandard-evaluation style:

r$> orList(ctr, time)
Error in orList(ctr, time) : unused argument (time)

Perhaps it would help the user if the function would accept objects directly:

function (...)
    return(list(`$or` = list(...))

Same goes for similar query constructors.

briatte commented 5 years ago

Comment # 5 [replicating vignette UTDEventData.html]

Thinking about the API key, it is customary to allow its storage as an environment variable (see ?options). Is that possible with this package? The vignette does not mention it.

briatte commented 5 years ago

Comment # 6 [replicating vignette UTDEventData.html]

The vignette is very helpful (and the authors do a great job at documenting possible errors/traps, e.g. Windows memory issues), yet I believe it would benefit from being broken down in more digestible chunks, such as:

briatte commented 5 years ago

Comment # 7 [replicating vignette UTDEventData.html]

I'm done replicating the vignette.

Thinking about the package and data more globally, I think I'd appreciate a mention, somewhere in the docs, of how the data relate to (and whether it can be articulated with) similar event data and related event nomenclatures, e.g. those that Phil Schrodt has worked (or is working) on.

Apologies for not being more specific here, as I have limited experience with event data -- if this comment is too vague to be addressed, I'll inquire a bit and reformulate.

briatte commented 5 years ago

Comment # 8 [reviewing paper.bib]

Re: this review point,

References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

It seems to me that IEEE papers always come with a DOI. Those are not in the bibliography. Same goes for the Schrodt paper.

As for the Althaus et al. dataset (BibTeX reference cline), is that kind of data that could get distributed via e.g. Zenodo (and get a DOI from there)?

briatte commented 5 years ago

Comment # 9 [checking my unchecked review points]

Functionality: Have the functional claims of the software been confirmed?

Yup. Checking off.

Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

I do not think this really applies here. Performance in this package boils down (mostly) to API query speed. While replicating the vignette, I found some queries to be slow-ish, but given the size of the data returned by some examples (250,000+ obs.), I'm fine with it.

Checking off.

Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?

Also checking off this one, given that the package cannot really be tested against the API without providing an API key.

Some very basic unit tests could be imagined for small parts of the code: perhaps the authors will want to introduce a few control flow checks (e.g. making sure the argument to andList is a list).

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Also checking off this one, as I'm not sure the authors really expect community contributions. The README includes everything else needed to email e.g. bugs or questions.

That's pretty much it for me: over to @andrewheiss.

Last, for the editor (@alexhanna), we need:

@whedon check references (see comment # 8 above for why)

arfon commented 5 years ago

@whedon check references

whedon commented 5 years ago
Attempting to check references...
whedon commented 5 years ago


- None


- may be missing for title: Near real-time atrocity event coding
- may be missing for title: TwoRavens for Event Data
- may be missing for title: Precedents, progress, and prospects in political event data
- may be missing for title: Adaptive scalable pipelines for political event data generation


- None
briatte commented 5 years ago

Dear @KateHyoung (I hope it's fine to address submitting authors in JOSS review threads?)

I posted a bunch of numbered comments in the thread above. I hope that you will find them useful in improving your package, which I found helpful and carefully coded. Thanks for your work, looking forward to discussing things further.

All the best~

KateHyoung commented 5 years ago

Thank you @briatte for taking your time to review our R package and documents. Your comments are very helpful to improve our package. Regarding comment #1, I will contact the server manager at UTD to improve security. For others, I will try my best to reflect your comments on codes and vignette and will discuss if I have questions on your comments. Thank you again!

KateHyoung commented 5 years ago

Dear @briatte and other reviewers

I have made some changes corresponding to the comments from @briatte.

  1. Response for the comment #2 I have fixed the code of DataTables(). With an invalid API key, the function returns a message.

    DataTables("fooboo") [1] "This API key is invalid. Please check that you have entered your key correctly."


  1. Response for the comment #3 I have fixed the function of tableVar() that now prints a variable list without a space. Along with the change of this function, returnRexExp() was also fixed to provide appropriate query elements for API syntax.

    tableVar(k, "Phoenix_rt") [1] "code" "src_actor" "month" "tgt_actor"

  2. Response for the comment #4 I am thinking which way is better for users. If I make it to accept country objects directly, users should provide it as c(“ctr”, “time”) in order to generate a proper query syntax. I feel that listing query elements with list() is no big difference with accepting the query element with c(). But if I have a better idea for it. I will update it.

  3. Response for the comment #8 I have updated the DOIs of the reference in a paper. Ref.

I am working on editing the vignette according to the comments on it. Once I have done for this work I will post it soon. Thank you again, Kate

alexhanna commented 5 years ago

Thanks so much for the comprehensive comments, @briatte, and your responses @KateHyoung.

@andrewheiss have you had a chance to look through the submission?

andrewheiss commented 5 years ago

Hi! Yep, I've been traveling way too much and have had unreliable internet access, but I should be able to do the formal review tomorrow. Sorry for the delay!

andrewheiss commented 5 years ago

UTDEventData is a fantastic new package that provides easy access to UT Dallas's event data server. This is an important package, given the complexities involved in pulling event data from half a dozen different sources, and I've already started to use it in my own research. The package is easy to install and use, and I only came across a few issues, noted below. I've opened issues for a few of these at the package repository.


The README is fairly sparse and doesn't include any details on how to use the package (beyond providing a list of possible functions). The vignette currently contains complete examples of how to use the package, but reading those examples outside of R (i.e. through GitHub) is trickier since GitHub doesn't display HTML.

The README should include instructions about how to set up and use the API key, either as a local variable (e.g. k <- "blah") or as a system variable (e.g. adding Sys.setenv(APIKEY = "blah") to .Rprofile).

I have not really ever come across instructions to build vignettes when building packages with devtools, even when a package has vignettes (see gganimate, for example), and most packages I've seen on GitHub don't include two different options for package installation (except when one version is on CRAN and one is on GitHub). Additionally, nothing in the vignette is actually run when the package is built, since eval=FALSE is enabled on all the vignette chunks. This makes sense to prevent overloading the event data server, and it makes it so users don't need an API key to install the package, but if nothing in the vignette actually runs, it might be easier to include lots of the documentation and examples in the README so users (1) don't have to explicitly build vignettes to see how the package works, and (2) users can follow rendered HTML online through GitHub.

But there's also utility in actually using vignettes so that users can run vignette("UTDEventData") within R to see documentation.

There's no right answer here. I don't have any official suggestion for README vs. vignettes, other than the fact that the README should at least include some quick "getting started right away" examples.

In a similar vein, the practical examples at the end of the vignette are extremely helpful, but they're buried at the bottom of the long vignette. Perhaps consider splitting the vignette into two parts—one highlighting the different ways of using the different functions, and one with the examples so that users can jump to actual usage faster.

API access

Allowing users to store the API key in an environment variable is convenient. It might be worth changing the name of the system variable. Right now it is APIKEY, but it's hard to distinguish that from other system-level variables users might be using. Adding a prefix like UTD_APIKEY or UTD_API would help with recognizability.

Additionally, it might be useful (someday later, perhaps—not necessarily for this release) to have the system-level API key a default option in functions like DataTables() or tableVar() so that users can run tableVar("Phoenix_rt") without needing to pass the api_key argument every time. Other functions like devtools::install_github() do something similar—if there's a GITHUB_PAT system variable, it uses it, otherwise it doesn't.

Functionality issues

The biggest issue is that the S4 class doesn't seem to be working correctly right now. When I run obj <- Table$new(), I get the following error:

# > obj <- Table$new()
# Error: object 'Table' not found

As such, I'm unable to test the rest of the obj$* commands. (see

I see the utility of building the API key into the Table objects, but the documentation doesn't explain why this method is preferable to the other more documented approaches to extracting data. The vignette mentions that using Table objects simplifies API calls, but that it is limited to just DataTables(), tableVar(), and pullData(). What should users do if they want to use other functions like sendQuery()? Is there value in maintaining two different ways of accessing data (especially when one method is incomplete)? (I'm assuming there is value—it might be helpful to explain the difference in approaches).

The "Data request function" section doesn't reproduce and there's minimal explanation of what query_block should look like. It says to pass "a list of queries created by andList(), orList(), or a single query block in the function as shown in the following example code," but the documentation hasn't covered andList() and orList() yet, and there's no working single query block in the example code. For now, running the example sendQuery() as it stands in the vignette produces an error:

myData <- sendQuery(k,"icews", query_block, citation = TRUE)

# Error in sendQuery(k, "icews", query_block, citation = TRUE) : 
#   object 'query_block' not found

It might be helpful to include code for creating an example query_block object that users can use before delving into all the query examples that follow.

There's a comment in the first example that "# the data set d1 and d2 are identical", but there is no d1 or d2. I'm guessing that these are now dt1 and dt, but maybe they should be renamed?

The call to xtable() in example 2 assumes that the user has loaded the xtable library already, but that never happens explicitly in the vignette. It might be helpful to prefix the function with xtable::xtable(). Alternatively, since you're using rmarkdown to generate the vignette, consider using pander::pandoc.table() or knitr::kable() to generate a Markdown table instead of raw TeX.

Clarity and quality of writing

The vignette is generally easy to follow and is a helpful guide to using the package.

There are several typos and grammatical errors in the vignette. I've corrected some of the typos in - a quick round of editing should clean up the rest.


There are no formal tests, likely because the package is heavily reliant on remote access to an API and it would be computationally costly to hit the server for each test. Creating a cache of data locally might be helpful for testing, or perhaps adding tests for non-server-based functions like returnLatLon() or returnDyad() could be helpful (this isn't necessary, though)


It might be helpful to have some community guidelines in the package too, such as a file (perhaps modeled after something like this or this) and a file (like this or this)

Phew. That's all I've got. I'm excited for this package!

KateHyoung commented 5 years ago

Dear reviewers,

Thank you for your helpful comments for our package. I thoughtfully went over every comment and tried to reflect all of them because those make sense and enhance the package’s operations.

For the @briatte ‘s comment #4, we have changed the function code as you suggested. Users now can throw objects directly rather than list up with list(). Ref.

For the @briatte ‘s comment #7, I have added paragraphs to describe how this package is linked to the other previous political event studies in the introduction part of the vignette.

@andrewheiss, thank you for the idea of handling API access in the package. I have learned lots of things from the information you provided me. I have incorporated the way to set an API key in some functions as an environment variable. See the updates in DataTables(), tableVar(), and prevewData() (Please see README and vignette files). I have tried to apply the method to apply to other functions that require an API key and noticed that some functions created minor issues because of the argument setting as breaking a function. I need more time to make them stable working without errors, so I leave it for later works and will complete it during a summer break. I hope this work is good enough for this version. Also, I have added a prefix to api_key variable as to be utd_api_key. Ref.

For the comments in Functionality issues, I have fixed the code so that Table is working well. Ref.

> obj<-Table$new()
> obj$setAPIKey("utd_api_key")
> obj$DataTables()

I have changed/deleted some codes not completely works in R code chunks in the vignette. And the xtable() code is also changed to the knitr::kable() function. You can find a comment about community guidelines in and in the file list. Ref:

README was restructured and now includes more information for users. Please see here. If you think it needs more information, please let me know.

Package’s vignette is edited and restructured according to the comments of @briatte and @andrewheiss. Ref.

I am trying to list up all my changes from the reviewer’s comments and hope not to miss any important feedback. If I have, please feel free let me know.

This entire review process was very useful to make the package stable and functional. I sincerely appreciate your time and comments for this review.

Best regards, Kate

alexhanna commented 5 years ago

@KateHyoung: thanks so much for incorporating these edits. @andrewheiss and @briatte, thanks for the comprehensive comments. Can you two review the changes on the updated package and let me know if you're willing to sign off on the package for meeting the JOSS criteria for acceptance?

andrewheiss commented 5 years ago

These changes look great! I approve and sign off. ✅

briatte commented 5 years ago

The changes look very thorough indeed, and the updated package loads fine, as does the vignette: I'm signing off too.

alexhanna commented 5 years ago

@whedon accept

whedon commented 5 years ago

No archive DOI set. Exiting...

alexhanna commented 5 years ago

You'll need to get a DOI for your repository, @KateHyoung. You'll need to create an archive (on Zenodo, figshare, or somewhere else, if you haven't already) and post the archive DOI here.

KateHyoung commented 5 years ago

Hello @alexhanna,

Here is the archive DOI.

Thank you for kindly letting me know the information.

alexhanna commented 5 years ago

@whedon set 10.5281/zenodo.2648643 as archive

whedon commented 5 years ago

OK. 10.5281/zenodo.2648643 is the archive.

alexhanna commented 5 years ago

@whedon accept

whedon commented 5 years ago
Attempting dry run of processing paper acceptance...
whedon commented 5 years ago

Check final proof :point_right:

If the paper PDF and Crossref deposit XML look good in, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true
whedon commented 5 years ago


- 10.1109/isi.2016.7745457 is OK
- 10.7910/DVN/28075 is OK
- 10.1109/iri.2018.00065 is OK
- 10.1080/03050629.2012.697430 is OK
- 10.1109/bigdata.2017.8258256 is OK


- None


- None
alexhanna commented 5 years ago

@whedon accept deposit=true

whedon commented 5 years ago

I'm sorry @alexhanna, I'm afraid I can't do that. That's something only editor-in-chiefs are allowed to do.

alexhanna commented 5 years ago

@kyleniemeyer and @arfon, this is ready for deposit.

danielskatz commented 5 years ago

Hi - in the future, please notify @openjournals/joss-eics to get the AEiC on duty (me this week)

danielskatz commented 5 years ago

@KateHyoung - before we accept this, please fix the references in the paper.

  1. Items that should be in upper case may need additional {}s around them to preserve the case.
  2. For the Salam reference, some additional detail about the paper is needed. "In. EECS." is not sufficient for someone else to find this.

Once you've done this, please check the resulting pdf by entering @whedon generate pdf in a new comment here.

Then we'll be ready to accept.

danielskatz commented 5 years ago

👋 @briatte & @andrewheiss - thanks for your reviews! And @alexhanna, thanks for your editing!

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

PDF failed to compile for issue #1322 with the following error:

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 13 0 13 0 0 81 0 --:--:-- --:--:-- --:--:-- 81 Error reading bibliography ./paper.bib (line 18, column 3): unexpected "u" expecting space, ",", white space or "}" Error running filter pandoc-citeproc: Filter returned error status 1 Looks like we failed to compile the PDF

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left: