ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

rdflib #169

Closed cboettig closed 6 years ago

cboettig commented 6 years ago

Summary

rdflib is simply a wrapper around two existing ropensci packages: redland and jsonld, which should be a user-friendly complement to the low-level interface already provided by redland for working with RDF (semantic/linked data).

Package: rdflib
Title: Tools to Manipulate and Query Semantic Data
Version: 0.0.2
Authors@R: person("Carl", "Boettiger", 
                  email = "cboettig@gmail.com", 
                  role = c("aut", "cre", "cph"),
                  comment=c(ORCID = "http://orcid.org/0000-0002-1642-628X"))
Description: The Resource Description Framework, or 'RDF' is a widely used
             data representation model that forms the cornerstone of the 
             Semantic Web. 'RDF' represents data as a graph rather than 
             the familiar data table or rectangle of relational databases.
             The 'rdflib' package provides a friendly and concise user interface
             for performing common tasks on 'RDF' data, such as reading, writing
             and converting between the various serializations of 'RDF' data,
             including 'rdfxml', 'turtle', 'nquads', 'ntriples', 'trig', and 'json-ld';
             creating new 'RDF' graphs, and performing graph queries using 'SPARQL'.
             This package wraps the low level 'redland' R package which
             provides direct bindings to the 'redland' C library.  Additionally,
             the package supports the newer and more developer friendly
             'JSON-LD' format through the 'jsonld' package. The package
             interface takes inspiration from the Python 'rdflib' library.
Depends: R (>= 3.4.1)
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
URL: https://github.com/cboettig/rdflib
BugReports: https://github.com/cboettig/rdflib/issues
Imports: redland,
    jsonld,
    methods,
    utils
RoxygenNote: 6.0.1
Suggests: magrittr,
    covr,
    testthat,
    knitr,
    rmarkdown,
    jsonlite,
    httr,
    xml2,
    jqr,
    readr,
    dplyr,
    lubridate,
    DT
VignetteBuilder: knitr

https://github.com/cboettig/rdflib

data extraction, because this package parses scientific data file formats. (specifically, formats already parsed by existing rOpenSci packages). This package also enables graph queries using the SPARQL language, somewhat analogous to the rOpenSci jqr package, but for JSON-LD and other linked data formats.

Anyone working with semantic data, including the wide array of scientific ontologies and knowledge-bases. These include reproducibility-focused ontologies like PROV, and a large number of biological ontologies ranging from genes to traits to environmental features.

As described above, this package overlaps significantly with the redland package, but should be easier to use.

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

karthik commented 6 years ago

👋 @cboettig Thanks for the submission. Since it would be a COI for any of the four editors to handle your submission, I have asked @lmullen to serve as the ad hoc editor on this one and he has graciously agreed. I will let Lincoln take it from here. 🚀

lmullen commented 6 years ago

Editor checks:


Editor comments

Passes devtools::check() without issue.

R CMD check results
0 errors | 0 warnings | 0 notes

Here is the result of goodpractice::gp().

It is good practice to

  ✖ write unit tests for all functions, and all
    package code in general. 97% of code lines are covered by
    test cases.

    R/rdf.R:274:NA
    R/rdf.R:275:NA

  ✖ not import packages as a whole, as this can cause
    name clashes between the imported packages. Instead, import
    only the specific functions you need.

97% coverage is excellent.

The advice not to import entire packages is up to the discretion of the reviewers.

Note one misspelling (occured) from devtools::spell_check().

@karthik I am now approaching reviewers.


Reviewers: Due date:

lmullen commented 6 years ago

Two reviewers have agreed to review this package. Reviewers, thanks for being willing, and I'll ask you to have your reviews in within three weeks. Here is the reviewer's guide. Feel free to let me know if you have any questions.

Reviewer: Anna Krystalli, @annakrystalli Reviewer: Bryce Mecum, @amoeba Due date: 2018-01-25

@cboettig Could you please add the rOpenSci under review badge to the README for this package? Here is the snippet.

[![](https://badges.ropensci.org/169_status.svg)](https://github.com/ropensci/onboarding/issues/169)
amoeba commented 6 years ago

Package Review

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [x] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [x] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 2


Review Comments

This looks like an excellent package for inclusion in the ropensci ecosystem. I have personal experience with the problem this package is trying to solve, namely working with RDF in R from the analysts perspective. The RDF package this package wraps, redland, is intended more for developers to write packages on top of than for end users to do things like run SPARQL queries or manipulate RDF graphs. I would use this package.

The package is laid out in a non-surprising manner, most functions are short and well-scoped, and, overall, the code is very readable. The accompanying test suite is reasonable and provides 100% coverage, and the single vignette is well-written and useful.

I did find the documentation could use some polish in some places (see comments below). I suspect a pass or two by the author would make some good improvements without much work.

I have left two checkboxes unchecked due to the following issues:

but I otherwise found everything else to be in order.

Higher level

Lower level

These were written out as I went through the checklist:

lmullen commented 6 years ago

Thanks for getting your review in, @amoeba.

annakrystalli commented 6 years ago

Hello all and apologies for the delay! Here is my review:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 7


Review Comments

This package is a great and lightweight addition to working with rdf and linked data in R. Coming after my review of the codemetar package which introduced me to linked data, I found this a great learning experience into a topic I've become really interested in but am still quite novice in so I hope my feedback helps to appreciate that particular POV.

Overall I feel package functionality is complete and self-contained (apart from one error identified below). My main feedback is regarding documentation, specifically how it could be improved to help novice users to grasp the value of semantic data and better understand how the package works.

installation

The only install comment I'll add is that when I first ran install(pkg_dir, dependencies = T, build_vignettes = T), the building of the vignettes threw an error because suggests package ‘jqr’ had not been installed yet? It worked without build_vignettes = T

pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib-review/../rdflib"
devtools::install(pkg_dir, dependencies = T, build_vignettes = T)
#> Installing rdflib
#> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
#>   --no-environ --no-save --no-restore --quiet CMD build  \
#>   '/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib'  \
#>   --no-resave-data --no-manual
#> 
#> Error: Command failed (1)

with the console output:

* checking for file ‘/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib/DESCRIPTION’ ... OK
* preparing ‘rdflib’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Quitting from lines 21-38 (rdflib.Rmd) 
Error: processing vignette 'rdflib.Rmd' failed with diagnostics:
there is no package called 'jqr'
Execution halted

Installing without building the vignette results in successful installation of jqr.

pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib-review/../rdflib"
devtools::install(pkg_dir, dependencies = T)
#> Installing rdflib
#> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
#>   --no-environ --no-save --no-restore --quiet CMD INSTALL  \
#>   '/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib'  \
#>   --library='/Users/Anna/Library/R/3.4/library' --install-tests
#> 
#> Installing jqr
#> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
#>   --no-environ --no-save --no-restore --quiet CMD INSTALL  \
#>   '/private/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T/RtmpbYeNu9/devtools6d3b4a7582a1/jqr'  \
#>   --library='/Users/Anna/Library/R/3.4/library' --install-tests
#> 

if jqr is installed, installation and vignette building proceeds successfully.

tests and checks

All OK

documentation

My main suggestion is to try to define some terms and improve the concept map for the tools by adding some detail and broader context to the documentation. The following suggestions could also be addressed with links to further details if you think they are too superfluous for explicit documentation with the package.

Spelling a few things out in plain english and explicitly could really help folks follow what's going better and understand what file types are inputs or outputs of different functions.

how do I find info on URIs?

Some signposting/guidance on how I can find information on the semantics dictating what information I can extract from an rdf object would be really useful. eg. with a df or list you could use str to get an idea of how you could start indexing these objects. If confronted with a local rdf file, how would one go about figuring out even what they can query? I appreciate this is really one of the difficulties of working with rdf and semantic data in general (the flipside to the ease of being able to make unstructured queries is that we need to know how data are labelled) but I feel some brief guidance or demo on how one would approach this would go a long way.

examples in general

For clarity to the reader who may not have looked at function documentation yet, I recommend using the full argument names when supplying arguments to functions (if not always atleast the first time an argument is introduced) in vignettes.

SPARQL queries to JSON data section

At the end of the intro to the section, you write:

Here is a query that for all papers where I am an author, returns a table of given name, family name and year of publication:

Am I right in thinking though that you are co-author on all papers in the rdf but the query is in fact filtering the names of your co-authors? (through FILTER ( ?coi_family != "Boettiger" ))

Turning RDF-XML into more friendly JSON

It would be nice if possible to see sample of print outs of the conversion of the different files or at least of the effect of compaction.

rdf_add man page

Would be nice to see a demo of using one or more of the additonal arguments.

Motivating example

I think an additional, more detailed motivating example might illustrate more direct use case in a researchers workflow. In particular it would be good to highlight the great potential of triplestore APIs (and celebrate the efforts of many cool eg governmental linked data initiatives). So an example that incorporates a query to a triplestore and then enrichment of a researcher's data could be a cool example. This could be a longer term project or even just an rOpenSci blogpost but see comment re: rdf_query function below.

functionality

Tests

Add tests for being able to serialise to trig and turtles which at the moment is throwing an error? Perhaps a test for parsing/serialising each format would be good. Also, perhaps worth checking whether eg rdf_parse(format="turtle") is working.

👍

lmullen commented 6 years ago

Thanks for getting your review in, @annakrystalli.

Now that both reviews are in, could you respond to the reviews and make changes as necessary, @cboettig? If possible, please do so within 2 weeks, which would be February 13.

cboettig commented 6 years ago

@lmullen @annakrystalli @amoeba Thanks for your reviews!

I've just about finished addressing the issues raised at this point, which I've summarized in:

A summary of the changes can be found in NEWS.md, which ended up being reasonably involved because the reviews got me thinking about a bunch of stuff, which was awesome.

However, most substantive is perhaps the development of a new vignette, which I've liberally titled A tidyverse lover’s intro to RDF. This tries to address the big-picture issues Anna in particular highlights regarding documenting and motivating the broader context of RDF. This is still a bit more of a draft than a polished document, but given that my two weeks are up I think it might be a good time to get feedback on this (and the other changes) from the reviewers. In particular, I would love to hear what the reviewers think of this as a broader introduction.

If the reviewers are interested and think it would be worthwhile, I believe it might be nice to overhaul this new vignette into a more general purpose intro to RDF for R users (both the relevant packages and concepts) that might be suitable for a submission to something like the R Journal. I'd love entice Anna and Bryce to be co-authors if they are interested...

lmullen commented 6 years ago

@cboettig Thanks for getting your review in on time, and for the thoroughness of the changes and how you reported them. I'm looking forward to reading the new vignette.

@amoeba and @annakrystalli: Could you please go over the changes to the package and report back within one week? That would be by Thursday, February 22. I'll do the same.

amoeba commented 6 years ago

Hey @lmullen and @cboettig: I've reviewed the responses and changes @cboettig has made in response to my review and I every issue I raised has been addressed. I have no remaining issues and recommend the submission be accepted as modified. @lmullen would you like us to review the new vignette before acceptance? That's fine with me and I can certainly do that within the week.

@cboettig I'm super excited with the direction you're taking. I'd certainly like to continue working on this package and a paper. In particular, this clicks for me:

you can just about always get things down to about three columns,

I'd never before seen the equivalence between tidy data principles and RDF. I'll follow up with you elsewhere.

annakrystalli commented 6 years ago

Hi all 😃

I am really happy with the changes made and the direction of the vignette! triplestores are indeed the ultimate tidy data! A great way to sell it. It's already a great resource and am also happy to contribute to both the vignette and a paper on it. I'll feedback to some of the discussions raised in @cboettig response inrdflib issue.

So ✅ and big 👍 from me also.

lmullen commented 6 years ago

@ameoba: Yes, if you could please offer whatever suggestions you think are necessary on the new vignette that would be great, but it seems like we are very close to being done.

amoeba commented 6 years ago

Okay, will do. I'll get those comments in this week.

lmullen commented 6 years ago

Hi @cboettig. Thanks for the thoroughness of your response to the reviews. At this point I don't see any reason to delay accepting the package into rOpenSci. Of course it looks like you are still figuring out the final form of a few things, especially in the new vignette, so it will be your call when to submit to CRAN.

@karthik I don't have access to rOpenSci admin accounts, so could you please begin the process of moving this package into the rOpenSci organization?

After that happens, @cboettig, could you please do the following?

Once the repository is moved I will close this issue.

karthik commented 6 years ago

Thank you @lmullen! Since Carl has ownership rights on the org (unlike most authors) he should be able to move this himself.

cboettig commented 6 years ago

Thanks @lmullen and @karthik! I've added the ropensci footer and migrated the repo to ropensci. 🚀 .

CI seems to be working (if I recall correctly I shouldn't migrate the appveyor since it only links to individual accounts?)

I assume there's nothing I need to do to update the onboarding badge, that happens automatically via tags on this issue, right?

I'll leave it to you editors to close out this thread when ready.

karthik commented 6 years ago

Thanks @cboettig & @lmullen!!

lmullen commented 6 years ago

@cboettig Great! Looking forward to using this myself the next time I need to deal with RDF.

Special thanks to @amoeba and @annakrystalli for doing the review.