ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

Submission: emld #269

Closed cboettig closed 5 years ago

cboettig commented 5 years ago

Summary

Transform Ecological Metadata Language (EML) files into JSON-LD and back into EML. Doing so creates a natural list-based representation of EML in R, so that EML data can easily be manipulated using standard R tools. This makes this package an effective backend for other tools for working with EML. Additioally, the JSON-LD representation enables the use of developer-friendly JSON parsing and informatics-friendly serializations such as RDF and SPARQL queries. This package is targeted primarily at developers. By abstracting away the complexity of EML's XML Schema, developers can build against intuitive list or JSON objects and not have to worry about satisfying many of the additional constraints of set by the schema (such as element ordering, which is handled automatically).

In practice, emld will basically be the backend machinery for a new release of the EML package, which will retain largely the same user-facing API, with several improvements. But both in terms of ease of reviewing and ease of package maintenance, I think it's important to have these as separate packages.

Package: emld
Title: Ecological Metadata as Linked Data
Version: 0.0.2
Authors@R: person("Carl", "Boettiger",
                  email = "cboettig@gmail.com",
                  role = c("aut", "cre", "cph"),
                  comment=c(ORCID = "http://orcid.org/0000-0002-1642-628X"))
Description: Transform Ecological Metadata Language (EML) files into JSON-LD
             and back into EML.  Doing so creates a natural list-based representation
             of EML in R, so that EML data can easily be manipulated using standard
             R tools. This makes this package an effective backend for other tools
             for working with EML.  Additioally, the JSON-LD representation enables
             the use of developer-friendly JSON parsing and informatics-friendly
             serializations such as RDF and SPARQL queries. This package is targeted
             primarily at developers. By abstracting away the complexity of EML's
             XML Schema, developers can build against intuitive list or JSON objects
             and not have to worry about satisfying many of the additional constraints
             of set by the schema (such as element ordering, which is handled automatically).
URL: https://github.com/cboettig/emld
BugReports: https://github.com/cboettig/emld/issues
Depends: R (>= 3.1.0)
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1
Suggests:
    spelling,
    testthat,
    magrittr,
    rmarkdown,
    covr,
    knitr,
    rdflib,
    jqr
Imports: xml2,
    jsonlite,
    jsonld,
    methods,
    yaml
VignetteBuilder: knitr
Language: en-US

data publication, data retrieval, and reproducibility.

Researchers and ecological informatics teams (at NEON, LTER sites, NCEAS, etc) who are consuming or managing EML and data described by EML.

This overlaps with the EML package, and will soon be the backend to the newer version of said package. This package works very differently than the current CRAN version of EML: instead of a massive S4 class system, this uses a single S3 class by exploiting the magic of JSON-LD.

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

lmullen commented 5 years ago

Thanks for your submission, @cboettig. I have run the editor's checks below. Please address the requested changes below. I have begun looking for reviewers and I will update this thread with the reviewers and due date once all reviewers have agreed to take this on.

Editor checks:


Editor comments

Changes for package authors:

[![](https://badges.ropensci.org/269_status.svg)](https://github.com/ropensci/software-review/issues/269)
test-template.R:11: failure: template knows about internal classes too
`print\(template\("ResponsibleParty"\)\)` does not match "individualName: \\{\\}".
Actual value: "\{\}"
E  checking examples (619ms)
   Running examples in 'emld-Ex.R' failed
   The error most likely occurred in:

   > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
   > ### Name: eml_locate_schema
   > ### Title: eml_locate_schema
   > ### Aliases: eml_locate_schema
   > 
   > ### ** Examples
   > 
   > ## No test: 
   > f <- system.file("examples", "example-eml-2.1.1.xml", package = "EML")
   > eml <- xml2::read_xml(f)
   Error: '' does not exist in current working directory ('/tmp/Rtmp2NUD8n/emld.Rcheck').

For reviewers:

  ✖ write unit tests for all functions, and all package code in general. 86%
    of code lines are covered by test cases.
lmullen commented 5 years ago

Two reviewers have agreed. Thanks to Kelly and Peter for being willing to take on this review. :tada: Typically we ask reviewers to complete their review in 3 weeks, but in this case that includes several holidays and most people's winter break. So I have set the due date with that in mind. Kelly and Peter, in addition to the typical checks listed in the reviewers' guide, I've noted a couple of things to pay attention to in my editorial comments above.

Reviewers: @khondula and @gothub

Due date: 2019-01-04

cboettig commented 5 years ago

Thanks @lmullen !

khondula commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 7

Review Comments

Summary

The emld package provides a handy set of functions that will facilitate manipulating and analyzing ecological metadata in R. The functions work as described, and the readme provides both a clear explanation of how emld differs from packages like EML, and ways to use the output of its functions in with SPARQL or JQ queries in rdflib or jqr respectively. Users of this package will likely be using it in combination within this ecosystem of related packages, however the translation functionality of this package should lower the barriers to entry for any user that is not familiar with any of EML, json, XML, or R lists.

The main issues I noticed were in the documentation, especially to correct the SPARQL query in the README/vignette so that a blank dataframe is not returned. Otherwise, I found the package easy to use and understand, and I look forward to seeing it published.

Documentation

From review checklist:

  1. Example needed for as_emld().
  2. The example for eml_locate_schema() requires having the EML package installed. Consider adding this package to Suggests in package documentation.
  3. I was unable to locate a CONTRIBUTING file.
  4. I was unable to locate the maintainer in the DESCRIPTION file.

Suggestions for improving package documentation:

  1. Typo in package description: "faciliate" should be "facilitate" on line 18 of DESCRIPTION.

Suggestions for improving function documentation:

  1. For as_emld, consider providing information about the return value of the function.
  2. The Description could be improved for as_json, as_xml, and template documentation.
  3. For template, consider providing more options for the object argument may be helpful.
  4. For template, the Value text refers to usage in vignettes however I was unable to locate that.

Suggestions for improving README:

  1. Typo in SPARQL queries section - I believe "names are act as variables" should be "names act as variables"
  2. Consider indenting as_json("hf205.json") line after pipe in SPARQL queries section for readability
  3. The SPARQL query in the example returns nothing as written. I believe the eml PREFIX should be updated to <eml://ecoinformatics.org/eml-2.2.0/>.
  4. I was able to run everything in the vignette without separately loading the library maggritr. Is this required?

Functionality

  1. The naming scheme is slightly different than the object_verb format suggested in rOpenSci packaging guidelines, however it is consistent with other conversion-based functions. My only concern is that while the function names are not in conflict, they are similar to non-eml specific conversion functions jsonlite::toJSON and xml2::as_xml_document. The emld::as_xml details clarify that the funciton is specific to EML and emld objects -- perhaps it would be useful to clarify this in the as_json documentation as well?
  2. Although it is rather intuitive, please add the type of return object to the documentation for the functions as_emld, as_json, and as_xml to keep with rOpenSci packaging guidelines (1.6).
  3. Consider adding top-level documentation to return from ?emld in keeping with rOpenSci packaging guidelines (1.6).
  4. Are functions in as_eml_document.R, validate_units.R, etc. internal functions that should have #' @noRd as per rOpenSci packaging guidelines (1.6)?

Additional comments:

❯ checking package dependencies ... NOTE
  Package suggested but not available for checking: ‘spelling’

❯ checking top-level files ... NOTE
  Non-standard file/directory found at top level:
    ‘LICENSE.md’
 ✖ fix this R CMD check ERROR: Package suggested but not available: ‘spelling’ The suggested
    packages are required for a complete check. Checking can be attempted without them by setting the
    environment variable _R_CHECK_FORCE_SUGGESTS_ to a false value. See section ‘The DESCRIPTION file’ in
    the ‘Writing R Extensions’ manual.
gothub commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 5-6 hours


Reviewer Comments

The case made for the necessity of the emld package in converting EML to JSON-LD is clear and compelling. Providing the ability to inspect EML derived data as S3 lists, JSON and via SPARQL queries will facilitate creation of new tools and workflows to automate access and interpretation.

Documentation

In the Motivations section of README.md the purpose and method of "extending EML with other semantic vocabularies" isn't clear (to me). Consider adding a section such as "Extending EML" with an example, if this is within the scope of the package.

The help text for the as_xml Description contains only the text as_xml and does not describe what the function does. Also, the return type is not specified. The same is true for both as_emld and as_json, template.

There is no documentation index is available for the package.

There is no documentation for ?emld' or ?emld-package.

This may be outside the scope of the package, but no functions, strategies or examples are presented in the documentation for simple editing of EML data, i.e. (reading, simple edit, write out). This would be very useful, but if this functionality presents to much overlap with the EML package, pleas disregard this comment.

Functionality

The as_json, as_emld, as_xml functions have a clear purpose and work as expected.

In addition to testing using the provided code samples, the following checks for a complex EML document were performed for a couple of EML documents such as https://goa.nceas.ucsb.edu/#view/urn:uuid:3249ada0-afe3-4dd6-875e-0f7928a4c171:

x <- as_emld('metadata.xml') # using 
as_xml(x, "new.xml") 

R Source

The following potentially unresolved items exist, as show from a quick scan of the source code:

$ grep FIXME *
as_emld.R:    ## FIXME technically this assumes only our context
as_xml.R:## FIXME drop NAs too?
eml_validate.R:    ##  FIXME shouldn't have to write to tempfile,
eml_validate.R:  # FIXME technically must be unique inside parent system only...
emld-methods.R:## FIXME: Print method should drop context

Misc

devtools::check() reports non-standard file LICENSE.md

lmullen commented 5 years ago

Thank you, @khondula and @gothub for your thorough reviews.

@cboettig: Please let us know if you have any questions about the reviews. Will you be able to complete the necessary changes in two weeks time, i.e., by January 25?

cboettig commented 5 years ago

@khondula @gothub @lmullen Thanks for these very helpful and detailed reviews! I believe I have addressed all of these issues now. I've made the edits in a separate branch and created a PR, https://github.com/cboettig/emld/pull/30 which I will leave un-merged for now, hopefully that provides an easy way to see the differences. I've also summarized my replies to each of your point-by-point reviews (with a few clarifying questions and comments) in the two issues linked in that PR. Anchors aweigh :ship: ⚓️!

lmullen commented 5 years ago

Thanks for making these changes, @cboettig.

@khondula and @gothub: Could you please review these changes and make any further comments within one week, by Feb 2? If the package is now satisfactory, you can check of any remaining relevant boxes in your main review.

gothub commented 5 years ago

@lmullen @cboettig I had a look at https://github.com/cboettig/emld/issues/29 and it looks great!

This package is a really useful contribution, thx!

khondula commented 5 years ago

@cboettig re: the template function - I think if I were more familiar with the EML package I would agree that it might fit there better, but it seems like a nice function to have since it might help a user get more familiar with eml if they are starting from scratch. The fxn documentation seems to still refer to it being used in the readme though ("see vignettes"), which might be confusing( unless I am missing a different vignette).

otherwise everything looks great to me!

cboettig commented 5 years ago

@khondula @gothub Great! Thanks to you both for the constructive reviews. I've gone ahead and merged my own PR now.

@khondula I rebuilt the vignette, should have no mention of the template() function there now and I didn't see anything in searching for template in any of the main user facing functions. It's still in the package as a bit of a vestigial organ for the time being, may either be flushed our or removed in favor of functions from EML package.

I think that's everything on my end for the moment; if you're happy with the little paper.md for the JOSS side? Over to you, @lmullen

lmullen commented 5 years ago

Thank you @gothub and @khondula.

@cboettig We can consider this accepted now. Here are some post-review things for you to do.

lmullen commented 5 years ago

@cboettig Some other things pertaining to the JOSS submission.

Once you've submitted the paper, please let me know the URL to the submission and I will confirm that it has been peer reviewed by rOpenSci.

cboettig commented 5 years ago

Thanks @lmullen ! I've now completed the transfer to rOpenSci, and created the submission to JOSS at http://joss.theoj.org/papers/7e9064114c9a8d6756ef500cb5bc5d2b

lmullen commented 5 years ago

@cboettig Thanks, Carl. Looks like everything is wrapped up. Glad to see everything went through JOSS just fine.

@stefaniebutland This new package/paper might be a good candidate for a blog post at rOpenSci.

stefaniebutland commented 5 years ago

Yes indeed @lmullen. Carl knows he has a wide open invitation to blog about his packages :-)