ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

codemetar package #130

Closed cboettig closed 6 years ago

cboettig commented 7 years ago

Summary

Codemeta defines a 'JSON-LD' format for describing software metadata. This package provides utilities to generate, parse, and modify codemeta.jsonld files automatically for R packages.

Package: codemetar
Type: Package
Title: Generate CodeMeta Metadata for R Packages
Version: 0.1.0
Authors@R: person("Carl", "Boettiger", role=c("aut", "cre", "cph"), email="cboettig@gmail.com", comment="http://orcid.org/0000-0002-1642-628X")
Description: Codemeta defines a 'JSON-LD' format for describing software metadata.
    This package provides utilities to generate, parse, and modify codemeta.jsonld
    files automatically for R packages.
License: MIT + file LICENSE
URL: https://github.com/codemeta/codemetar
BugReports: https://github.com/codemeta/codemetar/issues
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
Depends: R (>= 3.0.0)
Imports: jsonlite (>= 1.3),
    jsonld,
    git2r,
    devtools,
    methods,
    stringi
Suggests: testthat,
    jsonvalidate,
    covr,
    knitr,
    rmarkdown,
    httr,
    magrittr,
    readr,
    xml2
VignetteBuilder: knitr
keywords: metadata, codemeta, ropensci, citation, credit
affiliation: https://ropensci.org
relatedLink: https://codemeta.github.io/codemetar

Package for creating and working with scientific software metadata

Academic researchers looking to create metadata for their software

Nope

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

A few lines of code handle exceptional cases that are difficult to cover in unit tests. Otherwise no there should be no outstanding goodpractice flags in the repo.

Perhaps anyone interested in metadata/archiving or just in messing around with json, maybe:

@amoeba @gothub @jennybc @sckott @stephlocke @jeroen @mfenner @o2r-project

maelle commented 7 years ago

Thanks for your submission @cboettig ! 😸 Looking for reviewers now. I also have two comments about one vignette and one test below.

Editor checks:


Editor comments

Reviewers: @toph-allen @annakrystalli Due date: 2017-08-07

cboettig commented 7 years ago

@maelle what version of dplyr are you using? as_tibble() is (once again) exported from dplyr as of the 0.7.0 release: https://github.com/tidyverse/dplyr/blame/master/NEWS.md#L48-L49

Re the cloned-repo test on Windows, the error I see on Appveyor comes from the devtools::use_build_ignore() function, and I couldn't find examples from any unit tests in the devtools package for that function so I'm a bit at a loss. Must have something to do with how Windows handles file paths, but not sure what. Does that devtools function work on windows with the default argument to pkg = "." for the path?

maelle commented 7 years ago

Ill look on Sunday or Monday but maybe add 0.7.0 as minimal version in DESCRIPTION?

For me the first error was bc of the file path, the beginning worked when omitting the / at the end but I'll look again in a few days and hopefully make a PR or suggestions. Windows user at your service! 😉

maelle commented 7 years ago
maelle commented 7 years ago

@annakrystalli @toph-allen Thanks a lot for accepting to review this package! Your reviews are due on the 2017-08-07.

As a reminder here is a link to the reviewing guide and here is one to the review template.

@cboettig you can be one of the first submitters to use test our new dynamic RO package status badge! You can add the following to your README.

[![](https://ropensci.org/badges/130_status.svg)](https://github.com/ropensci/onboarding/issues/130)

annakrystalli commented 7 years ago

No problem @maelle! Really interesting package @cboettig and excited to learn from the experience.

maelle commented 7 years ago

@annakrystalli @toph-allen friendly reminder that your reviews are due on Monday, August the 7th. 😉

annakrystalli commented 7 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

  • [x] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 15 (too long I know!)


Review Comments

This package provides functionality to easily create JSON-LD metadata files describing r packages according to agreed on code medata controlled vocabularies. The functions are lightweight, clean and have a lot of automation and quality control baked in from their ability to extract information from package documentation and online standardised resources.

It is a great addition to rOpenSci and general movements towards both linked data and better curation, visibility and citability of software. Overall, the functions are smooth and easy to use. I think the most difficult part of the package is getting your head round the concepts. There is good codemeta and JSON-LD documentation to which codemetar documentation links to (I love Manu's videos!). I also realise that the purpose of package documentation is mainly to demonstrate use and not necessarily educate on the background. However I feel that a small amount of extra explanation and jargon busting could really help users get their head round what's going on and why it's such an important and awesome initiative!

build/installation

No problems on mac

documentation

As mentioned above, here are some suggestions on how I feel the readme could be more informative and self contained:

I would have loved a short background section in the intro that could include a few key definitions (which could then be used consistently throughout) and a touch of historical context: eg. (sorry this are probably rubbish definitions but hopefully you get the gist!)


Definitions & jargon busting

Briefly explain the difference between the data types (ie json, codemeta json-ld, codemeta r list) so that users can be cognisant of them when using package.

Describe how project is the convergence of a number of initiatives:

Once I got my head round the above, I felt I understood the purpose and the function of the package but without a short summary and information spread across various sources it took a little time.

None of this is of course necessary, I took special interest in the project because it really ties in with all my key interests so I'm probably a special case. But I almost feel that understanding these concepts could lead to better data management in general and adoption of JSON-LD more widely and for broader purposes. Let's just say this package could act as a gateway drug to JSON-LD, it certainly has for me!

function documentation

codemeta_validate
crosswalk
Overall function documentation comments:

I wonder if it could be useful to make a distinction in the function names functions that work with codemeta JSON and codemeta r list objects. Let's say we associated codemeta with the JSON-LD format and codemetar with r lists. Using these associations in function names could make it a bit more explicit to the user. E.g. Functions write_codemeta and validate_codemeta would remain the same because they either output or work on JSON-LD formats but create_codemeta could become create_codemetar to indicate that the output is an r list? I only mention it because it confused me a little at times but appreciate this is no biggie.

Vignettes

Good scope and demonstration of utility of package. A few suggestion that as a novice to the concepts would have helped me a little.

In the index of vignettes, I think the sequence would work better as:

  1. intro
  2. Translating between schema using JSON-LD
  3. Validation in JSON-LD
  4. Parsing CodeMeta Data

and include a really short description of what each vignette contains (like a sentence)

Intro

Really like ability to add through the DESCRIPTION file. Where you mention

See the DESCRIPTION file of the codemetar package for an example.

it could include an actual link to the DESCRIPTION file.

Translating between schema using JSON-LD

I found this paragraph confusing and have suggested how it might make the concept clearer. Also I think URI needs a short definition.

Unrecognized properties are dropped, since there is no consensus context into which we can expand them. Second, the expanded terms are then compacted down into the new context (Zenodo in this case.) This time, any terms that are not part of the codemeta Zenodo context are kept, but not compacted, since they still have meaningful contexts (that is, full URIs) that can be associated with them, but not compacted:

Validation in JSON-LD

Motivating example slightly confusing because while this sentence mentions an error being thrown up, all that is returned in r is a NULL.

However, there’s other data that is missing in our example that could potentially cause problems for our application. For instance, our first author lists no affiliation, so the following code throws an error:

Then when framing, the value to associate with the field is data missing is also NULL. I appreciate that the real value in the process is that the JSON-LD now contains and explicit field that contains a @null value but it might be worth spelling it out because the actual output in r pre & post framing are the same, ie NULL.

A few super minor typos which I've corrected and happy to send through a pull request?

functions

I found out (well should have known) on a plane when I planned to work on this review that the functions require an internet connection. It actually got me wondering whether internet-connection might be a good thing to list more generally as part of software requirements?

crosswalk

While it is relatively straight forward to get the crosswalk .csv, I feel it'd be good to be able to access information through the package. Here are some suggestions based on what I would find personally useful:

library(readr)
crosswalks <- function(){
    df <-
        readr::read_csv(
            "https://github.com/codemeta/codemeta/raw/master/crosswalk.csv",
            col_types = cols(.default = "c"))
    names(df)[!names(df) %in% c("Parent Type", "Property", "Type", "Description")]
}

crosswalks()
#>  [1] "codemeta-V1"                         
#>  [2] "DataCite"                            
#>  [3] "OntoSoft"                            
#>  [4] "Zenodo"                              
#>  [5] "GitHub"                              
#>  [6] "Figshare"                            
#>  [7] "Software Ontology"                   
#>  [8] "Software Discovery Index"            
#>  [9] "Dublin Core"                         
#> [10] "R Package Description"               
#> [11] "Debian Package"                      
#> [12] "Python Distutils (PyPI)"             
#> [13] "Trove Software Map"                  
#> [14] "Perl Module Description (CPAN::Meta)"
#> [15] "NodeJS"                              
#> [16] "Java (Maven)"                        
#> [17] "Octave"                              
#> [18] "Ruby Gem"                            
#> [19] "ASCL"                                
#> [20] "DOAP"                                
#> [21] "Wikidata"
library(readr)
crosswalk_map <- function(from, to, 
                            full_crosswalk =
  "https://github.com/codemeta/codemeta/raw/master/crosswalk.csv",
  trim = FALSE){
  df <-
    readr::read_csv(full_crosswalk,
             col_types = cols(.default = "c"))
  df <- df[c("Property", from, to)]
  if(trim) df <- df[!is.na(df[,from]),] # trim to `from` argument fields
  df
}

crosswalk_map(from = "GitHub", to = c("Zenodo", "Figshare"), trim = T)
#> # A tibble: 11 x 4
#>               Property        GitHub            Zenodo    Figshare
#>                  <chr>         <chr>             <chr>       <chr>
#>  1      codeRepository      html_url       relatedLink relatedLink
#>  2 programmingLanguage languages_url              <NA>        <NA>
#>  3         downloadUrl   archive_url              <NA>        <NA>
#>  4              author         login          creators        <NA>
#>  5         dateCreated    created_at              <NA>        <NA>
#>  6        dateModified    updated_at              <NA>        <NA>
#>  7             license       license           license     License
#>  8         description   description description/notes Description
#>  9          identifier            id                id        <NA>
#> 10                name     full_name             title       Title
#> 11        issueTracker    issues_url              <NA>        <NA>
write_codemeta

Compliance with rOpenSci Packaging Guide

Contributing

You can see more about my review here

annakrystalli commented 7 years ago

DOne! Have a great weekend all 🎉 And thanks for the feedback @maelle 👍

maelle commented 7 years ago

Thank you @annakrystalli for the awesome review, really well done! 😸 I hope these were enjoyable 15 hours! Have a nice week-end too!

toph-allen commented 7 years ago

@maelle I’ve had a busy few weeks, and won’t be able to submit a review by Monday. I’ll be taking some time off this week, though, so I’ll have time to get it done by the following Monday at the latest. Is that ok?

maelle commented 7 years ago

@toph-allen thanks for the update! No problem since @cboettig already has the other review 😊 new deadline August the 14th, have a nice less busy week until then!

toph-allen commented 7 years ago

@maelle Thank you for being flexible! ☺️

maelle commented 7 years ago

@toph-allen friendly reminder that your review is now due on Monday, August the 14th :smile_cat:

toph-allen commented 7 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

  • [x] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 3h


Review Comments

Summary

This package aims to make it easier to add linked metadata to R software projects, which is a really cool endeavor — lowering the barrier of entry to linked metadata means its broader adoption within scientific computing. The package does so mainly by parsing existing metadata specified in the R DESCRIPTION file and writing a JSON-LD metadata file containing that and other info. The README.md succinctly points people towards existing documentation of the larger CodeMeta Project and JSON-LD. As someone familiar with JSON-LD and linked data paradigms, but not the CodeMeta project, it took me little time to figure out the package’s functionality. The crosswalk() function took a little longer to comprehend. With a few changes, I’d definitely recommend approving this package.

Specific Comments

I encountered an error when I first ran install() on the cloned package. The dependency package V8 failed to install. However, the error message was helpful in specifying the version I needed to install with homebrew.

Running build_vignettes() at first failed, with Error: processing vignette 'codemeta-parsing.Rmd' failed with diagnostics: could not find function "as_tibble". Adding library(tibble) to that vignette’s library statements should solve that problem, as building the vignettes after loading that package succeeded. This also caused devtools::check() to fail.

Going through the package’s documentation alongside the ROpenSci Packaging Guide’s Documentation section, I made the following observations:

The vignettes and README.md file are generally rich, and a great source for expanding a user’s understanding of the package’s purpose and potential use cases. However, for entirely novice users, a reference to some introductory material might be helpful (per the package’s previous review).

There is no `CONTRIBUTING` file, and the `README.md` doesn’t contain community guidelines. However, the `DESCRIPTION` file specifies the requisite fields.

The package’s core functionality successfully executed when I tested it on a few different packages (i.e. write_codemeta() wrote the appropriate JSON-LD file.

The paper.md “References” section is currently empty.

The file R/add_metadata.R seems to be a vestige of development, and contains a bit of commented-out code. If it’s not needed, it should probably be deleted.

I’m not sure what the purpose of R/sysdata.rda file is (and from a few brief searches of the code, could not find a reference to it). If it serves none, it should probably be removed.

The exported functions write_codemeta() and create_codemeta() both follow a nice verb-subject naming scheme, and also reference the package name. It seems like the function codemeta_validate() could be renamed to validate_codemeta() to be consistent with these, though that is just personal preference.

The crosswalk() function looks like it requires internet access. This should probably be noted somewhere.

toph-allen commented 7 years ago

@maelle Here’s my review! Let me know if there’s anything that is unclear or should be changed.

maelle commented 7 years ago

Thanks a lot for your great review @toph-allen! :grin:

@cboettig now the two reviews are in! :wink:

cboettig commented 7 years ago

@annakrystalli @toph-allen A huge thanks for these immensely helpful reviews! It is such a treat to get this kind of detailed attention, and I particularly appreciate the efforts of you both to wrestle not only with implementation details but also with the larger concepts and how they are communicated.

I love @annakrystalli's comment about being a gateway drug to json-ld, it certainly has been for me and several of the vignettes really began as (perhaps still are) my playground for learning json-ld and what we can do with it. I think software metadata is generally both familiar and simple/limited enough (but not trivially so) to make it a good intro into the larger ideas.

Still working through the individual issues back in the codemetar repo and hope to have them done soon, but just wanted to chime in here while it is not too stale and say thanks!

cboettig commented 7 years ago

Ok, with apologies for the delay, all edits have now been addressed. I've tried to summarize changes in issues under an onboarding milestone, there are summaries issue for all of the changes made in response to Toph and Anna. Thanks again for all the help!

noamross commented 7 years ago

Thanks, @cboettig! @toph-allen and @annakrystalli, please look over the changes and let us know if they address your concerns. If yes, just say so and check the appropriate boxes up on your review. If not, let us know that else @cboettig should follow up on.

annakrystalli commented 7 years ago

Hi all, super happy with @cboettig responses and have checked the appropriate boxes. An offical Hello from the world to codemetar! 🚀

maelle commented 6 years ago

Thanks @annakrystalli!

@toph-allen can you have a look at @cboettig's work and answers and give a 👍 / :-1:? Thanks a lot in advance.

toph-allen commented 6 years ago

@maelle I’ll have comments shortly. Sorry for the delay.

maelle commented 6 years ago

Awesome @toph-allen thanks a lot in advance 😸

toph-allen commented 6 years ago

I'm sorry for the delay on this. I've looked through @cboettig's response, plus relevant updates to code & documentation, and it all looks great. I give a hearty 👍.

@cboettig, I really appreciate the specific & detailed responses. I'm happy to have been a part of the review process. Please let me know if there is anything else needed from me to proceed!

noamross commented 6 years ago

[I'm wrapping this up as @maelle is on vacation]

Thank you for your follow-ups @toph-allen and @annakrystalli! @cboettig, a few last things I am finding in final checks. I'll accept pending these changes:


Fix these and we can move on to onboarding. Here's the checklist for after:

To-dos:

cboettig commented 6 years ago

@noamross Thanks! Looks like usethis is not yet on CRAN. I think you must be running GitHub versions of devtools and usethis to get that warning?

Shall I go ahead and do the transfer and those subsequent steps now?

noamross commented 6 years ago

Ah, I am using development devtools. Disregard. Yes, transfer, but please do fix the text in the vignette first.

cboettig commented 6 years ago

@noamross thanks! Okay, I htink I've fixed that first vignette now as well (https://github.com/codemeta/codemetar/commit/57caa466f25724be5b2b9bc04e53829831304d49). Transferring now.

noamross commented 6 years ago

Approved! Thanks @toph-allen and @annakrystalli for your reviews and @cboettig for your work.

sckott commented 6 years ago

travis and codecov seem to be good, appveyor is tied to users, not github accounts, so you can add to yours Carl, I also just added to mine if you want to work off that one https://ci.appveyor.com/project/sckott/codemetar

cboettig commented 6 years ago

Seems the badge at https://github.com/ropensci/codemetar still says "Under Review", but the badge at https://ropensci.github.io/codemetar/ says "Peer Reviewed" (at least for me). Looks like a caching issue -- can we change the expiration time on the badge for the browser cache?

noamross commented 6 years ago

Pinging @karthik. I think we finally fixed the issue of github caching the badges, but I guess this is another header variable to change for browser cache?

(I see "peer reviewed" everywhere, myself)

maelle commented 6 years ago

@cboettig you haven't submitted the package to JOSS yet, have you?

cboettig commented 6 years ago

@maelle no, thanks for the ping, I'll try that now. anything special I need to do re connecting to the rOpenSci review?

cboettig commented 6 years ago

okay, I believe I've initiated submission with http://joss.theoj.org/papers/b16b968425f513da31632b0edd8ff0c5.

maelle commented 6 years ago

Perfect, I'll watch the new submissions at JOSS to comment in the Github issue right away to be sure it's flagged as rOpenSci-reviewed.

maelle commented 6 years ago

Closing this since everything is going well over at JOSS!