Closed cboettig closed 6 years ago
Thanks for your submission @cboettig ! 😸 Looking for reviewers now. I also have two comments about one vignette and one test below.
Nothing interesting from goodpractice::gp()
😉
Reviewers: @toph-allen @annakrystalli Due date: 2017-08-07
@maelle what version of dplyr are you using? as_tibble()
is (once again) exported from dplyr
as of the 0.7.0
release: https://github.com/tidyverse/dplyr/blame/master/NEWS.md#L48-L49
Re the cloned-repo test on Windows, the error I see on Appveyor comes from the devtools::use_build_ignore()
function, and I couldn't find examples from any unit tests in the devtools package for that function so I'm a bit at a loss. Must have something to do with how Windows handles file paths, but not sure what. Does that devtools
function work on windows with the default argument to pkg = "."
for the path?
Ill look on Sunday or Monday but maybe add 0.7.0 as minimal version in DESCRIPTION?
For me the first error was bc of the file path, the beginning worked when omitting the / at the end but I'll look again in a few days and hopefully make a PR or suggestions. Windows user at your service! 😉
I think I updated dplyr
by mistake since writing my comment but I see you updated DESCRIPTION.
Regarding the test failing on Windows see https://github.com/codemeta/codemetar/pull/19
@annakrystalli @toph-allen Thanks a lot for accepting to review this package! Your reviews are due on the 2017-08-07.
As a reminder here is a link to the reviewing guide and here is one to the review template.
@cboettig you can be one of the first submitters to use test our new dynamic RO package status badge! You can add the following to your README.
[![](https://ropensci.org/badges/130_status.svg)](https://github.com/ropensci/onboarding/issues/130)
No problem @maelle! Really interesting package @cboettig and excited to learn from the experience.
@annakrystalli @toph-allen friendly reminder that your reviews are due on Monday, August the 7th. 😉
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions in R help
[x] Examples for all exported functions in R Help that run successfully locally
no example provided for
drop_context()
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
Paper (for packages co-submitting to JOSS)
The package contains a
paper.md
with:
- [x] A short summary describing the high-level functionality of the software
- [x] Authors: A list of authors with their affiliations
- [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 15 (too long I know!)
This package provides functionality to easily create JSON-LD metadata files describing r packages according to agreed on code medata controlled vocabularies. The functions are lightweight, clean and have a lot of automation and quality control baked in from their ability to extract information from package documentation and online standardised resources.
It is a great addition to rOpenSci and general movements towards both linked data and better curation, visibility and citability of software. Overall, the functions are smooth and easy to use. I think the most difficult part of the package is getting your head round the concepts. There is good codemeta
and JSON-LD
documentation to which codemetar
documentation links to (I love Manu's videos!). I also realise that the purpose of package documentation is mainly to demonstrate use and not necessarily educate on the background. However I feel that a small amount of extra explanation and jargon busting could really help users get their head round what's going on and why it's such an important and awesome initiative!
No problems on mac
As mentioned above, here are some suggestions on how I feel the readme could be more informative and self contained:
I would have loved a short background section in the intro that could include a few key definitions (which could then be used consistently throughout) and a touch of historical context: eg. (sorry this are probably rubbish definitions but hopefully you get the gist!)
Linked data: data that has a context which links the fields used in the data to an online agreed standard.
context: mapping between a data source fields and a schema. Usually schema.org but domain specific ones also (eg codemeta)
Briefly explain the difference between the data types (ie json, codemeta json-ld, codemeta r list) so that users can be cognisant of them when using package.
Once I got my head round the above, I felt I understood the purpose and the function of the package but without a short summary and information spread across various sources it took a little time.
None of this is of course necessary, I took special interest in the project because it really ties in with all my key interests so I'm probably a special case. But I almost feel that understanding these concepts could lead to better data management in general and adoption of JSON-LD more widely and for broader purposes. Let's just say this package could act as a gateway drug to JSON-LD, it certainly has for me!
codemeta_validate
codemeta
argument description only mentions path/filename. The function, however, is geared up to accept a JSON-LD
, functionality demonstrated in the crosswalking vignette. So should probably be listed in the help file also.crosswalk
Description in help file could be a touch more informative, eg: Crosswalk between different metadata fields used by different repositories, registries and archives. For more details see here.
Also suggest that where the crosswalk table is mentioned, a link to it be supplied. e.g. from
-> the corresponding column name from the crosswalk table.
My understanding is that what is refered to here as a JSON list is the same as what is refered to in create_codemeta
documentation as codemeta object (list)? I think standardisation of the terminology refering to different datatypes throughout documentation would be best.
I wonder if it could be useful to make a distinction in the function names functions that work with codemeta JSON and codemeta r list objects. Let's say we associated codemeta
with the JSON-LD format and codemetar
with r lists. Using these associations in function names could make it a bit more explicit to the user. E.g. Functions write_codemeta
and validate_codemeta
would remain the same because they either output or work on JSON-LD formats but create_codemeta
could become create_codemetar
to indicate that the output is an r list? I only mention it because it confused me a little at times but appreciate this is no biggie.
Good scope and demonstration of utility of package. A few suggestion that as a novice to the concepts would have helped me a little.
In the index of vignettes, I think the sequence would work better as:
and include a really short description of what each vignette contains (like a sentence)
Really like ability to add through the DESCRIPTION file. Where you mention
See the DESCRIPTION file of the codemetar package for an example.
it could include an actual link to the DESCRIPTION file.
I found this paragraph confusing and have suggested how it might make the concept clearer. Also I think URI needs a short definition.
Unrecognized properties are dropped, since there is no consensus context into which we can expand them. Second, the expanded terms are then compacted down into the new context (Zenodo in this case.) This time, any terms that are not part of the
codemetaZenodo context are kept,but not compacted,since they still have meaningful contexts (that is, full URIs) that can be associated with them, but not compacted:
Motivating example slightly confusing because while this sentence mentions an error being thrown up, all that is returned in r is a NULL
.
However, there’s other data that is missing in our example that could potentially cause problems for our application. For instance, our first author lists no affiliation, so the following code throws an error:
Then when framing, the value to associate with the field is data missing is also NULL
. I appreciate that the real value in the process is that the JSON-LD now contains and explicit field that contains a @null
value but it might be worth spelling it out because the actual output in r pre & post framing are the same, ie NULL
.
A few super minor typos which I've corrected and happy to send through a pull request?
I found out (well should have known) on a plane when I planned to work on this review that the functions require an internet connection. It actually got me wondering whether internet-connection
might be a good thing to list more generally as part of software requirements?
crosswalk
While it is relatively straight forward to get the crosswalk .csv, I feel it'd be good to be able to access information through the package. Here are some suggestions based on what I would find personally useful:
.csv
.to
and from
in crosswalk
. Could be cool to have another function eg crosswalks
that prints the available crosswalk column options, eg:library(readr)
crosswalks <- function(){
df <-
readr::read_csv(
"https://github.com/codemeta/codemeta/raw/master/crosswalk.csv",
col_types = cols(.default = "c"))
names(df)[!names(df) %in% c("Parent Type", "Property", "Type", "Description")]
}
crosswalks()
#> [1] "codemeta-V1"
#> [2] "DataCite"
#> [3] "OntoSoft"
#> [4] "Zenodo"
#> [5] "GitHub"
#> [6] "Figshare"
#> [7] "Software Ontology"
#> [8] "Software Discovery Index"
#> [9] "Dublin Core"
#> [10] "R Package Description"
#> [11] "Debian Package"
#> [12] "Python Distutils (PyPI)"
#> [13] "Trove Software Map"
#> [14] "Perl Module Description (CPAN::Meta)"
#> [15] "NodeJS"
#> [16] "Java (Maven)"
#> [17] "Octave"
#> [18] "Ruby Gem"
#> [19] "ASCL"
#> [20] "DOAP"
#> [21] "Wikidata"
I also found the non-exported function crosswalk_table
quite useful (some commented out code in there). Other's might too?
But I feel the most useful would be to be able to narrow down field mappings between particular repositories of interest. So building on the crosswalk_table
function, I would probably find the following functions quite useful:
library(readr)
crosswalk_map <- function(from, to,
full_crosswalk =
"https://github.com/codemeta/codemeta/raw/master/crosswalk.csv",
trim = FALSE){
df <-
readr::read_csv(full_crosswalk,
col_types = cols(.default = "c"))
df <- df[c("Property", from, to)]
if(trim) df <- df[!is.na(df[,from]),] # trim to `from` argument fields
df
}
crosswalk_map(from = "GitHub", to = c("Zenodo", "Figshare"), trim = T)
#> # A tibble: 11 x 4
#> Property GitHub Zenodo Figshare
#> <chr> <chr> <chr> <chr>
#> 1 codeRepository html_url relatedLink relatedLink
#> 2 programmingLanguage languages_url <NA> <NA>
#> 3 downloadUrl archive_url <NA> <NA>
#> 4 author login creators <NA>
#> 5 dateCreated created_at <NA> <NA>
#> 6 dateModified updated_at <NA> <NA>
#> 7 license license license License
#> 8 description description description/notes Description
#> 9 identifier id id <NA>
#> 10 name full_name title Title
#> 11 issueTracker issues_url <NA> <NA>
example_json_r_list %>% crosswalk("R Package Description")
is the only way to get from a JSON-LD r list
to a JSON-LD
in r, as toJSON
doesn't work. While its great that there is a way to do it, it almost feels a bit of a hack. At the very least I feel it should be included as an example in the vignette, so users are aware of it but I'm wondering if an explicit function for that might also be useful?write_codemeta
DESCRIPTION
is detected), adding "codemeta.json"
to .Rbuildignore
assumes that the user has not changed the path
. While it is advised in the help file to leave as default, as the user can change it, there could theoretically be a situation where the user has called it something else but the function has written "codemeta.json"
to .Rbuildignore
. Just wondering whether that would cause any problems downstream? JSON-LD r list
instead of a path to pkg
, function works but throws a warning: the condition has length > 1 and only the first element will be used
You can see more about my review here
DOne! Have a great weekend all 🎉 And thanks for the feedback @maelle 👍
Thank you @annakrystalli for the awesome review, really well done! 😸 I hope these were enjoyable 15 hours! Have a nice week-end too!
@maelle I’ve had a busy few weeks, and won’t be able to submit a review by Monday. I’ll be taking some time off this week, though, so I’ll have time to get it done by the following Monday at the latest. Is that ok?
@toph-allen thanks for the update! No problem since @cboettig already has the other review 😊 new deadline August the 14th, have a nice less busy week until then!
@maelle Thank you for being flexible! ☺️
@toph-allen friendly reminder that your review is now due on Monday, August the 14th :smile_cat:
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
However, documentation is present for a few non-exported functions, and a top-level documentation file is missing.
Examples are missing from a few functions.
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Paper (for packages co-submitting to JOSS)
The package contains a
paper.md
with:
- [x] A short summary describing the high-level functionality of the software
- [x] Authors: A list of authors with their affiliations
- [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
A local run of
devtools::check()
fails due to the lack oflibrary(tibble)
in thecodemeta-parsing.Rmd
vignette.
Estimated hours spent reviewing: 3h
This package aims to make it easier to add linked metadata to R software projects, which is a really cool endeavor — lowering the barrier of entry to linked metadata means its broader adoption within scientific computing. The package does so mainly by parsing existing metadata specified in the R DESCRIPTION
file and writing a JSON-LD metadata file containing that and other info. The README.md succinctly points people towards existing documentation of the larger CodeMeta Project and JSON-LD. As someone familiar with JSON-LD and linked data paradigms, but not the CodeMeta project, it took me little time to figure out the package’s functionality. The crosswalk()
function took a little longer to comprehend. With a few changes, I’d definitely recommend approving this package.
I encountered an error when I first ran install()
on the cloned package. The dependency package V8
failed to install. However, the error message was helpful in specifying the version I needed to install with homebrew
.
Running build_vignettes()
at first failed, with Error: processing vignette 'codemeta-parsing.Rmd' failed with diagnostics: could not find function "as_tibble"
. Adding library(tibble)
to that vignette’s library statements should solve that problem, as building the vignettes after loading that package succeeded. This also caused devtools::check()
to fail.
Going through the package’s documentation alongside the ROpenSci Packaging Guide’s Documentation section, I made the following observations:
codemetar.R
file with package-level help, so that ?codemetar
leads to something (e.g. https://github.com/tidyverse/dplyr/blob/master/R/dplyr.r).add_context()
which isn’t exported, whereas sister function drop_context()
isn’t exported. If one is exported, the other probably should be, and vice versa. If both are exported, they should have examples added to their documentation.crosswalk()
and crosswalk_transform()
, the latter of which isn’t exported. The former is an exported function, but isn’t well-documented — the “Description” field is just “crosswalk”, and doesn’t sufficiently explain what the function does. Maybe it’s not supposed to be exported, but if it is, its purpose should be explained more fully.write_codemeta()
function should have a fuller ”Description” statement, but is otherwise sufficient. Specifically, it should explain how its usage differs from create_codemeta()
.The vignettes and README.md file are generally rich, and a great source for expanding a user’s understanding of the package’s purpose and potential use cases. However, for entirely novice users, a reference to some introductory material might be helpful (per the package’s previous review).
There is no `CONTRIBUTING` file, and the `README.md` doesn’t contain community guidelines. However, the `DESCRIPTION` file specifies the requisite fields.
The package’s core functionality successfully executed when I tested it on a few different packages (i.e. write_codemeta()
wrote the appropriate JSON-LD file.
The paper.md “References” section is currently empty.
The file R/add_metadata.R
seems to be a vestige of development, and contains a bit of commented-out code. If it’s not needed, it should probably be deleted.
I’m not sure what the purpose of R/sysdata.rda
file is (and from a few brief searches of the code, could not find a reference to it). If it serves none, it should probably be removed.
The exported functions write_codemeta()
and create_codemeta()
both follow a nice verb-subject naming scheme, and also reference the package name. It seems like the function codemeta_validate()
could be renamed to validate_codemeta()
to be consistent with these, though that is just personal preference.
The crosswalk()
function looks like it requires internet access. This should probably be noted somewhere.
@maelle Here’s my review! Let me know if there’s anything that is unclear or should be changed.
Thanks a lot for your great review @toph-allen! :grin:
@cboettig now the two reviews are in! :wink:
@annakrystalli @toph-allen A huge thanks for these immensely helpful reviews! It is such a treat to get this kind of detailed attention, and I particularly appreciate the efforts of you both to wrestle not only with implementation details but also with the larger concepts and how they are communicated.
I love @annakrystalli's comment about being a gateway drug to json-ld, it certainly has been for me and several of the vignettes really began as (perhaps still are) my playground for learning json-ld and what we can do with it. I think software metadata is generally both familiar and simple/limited enough (but not trivially so) to make it a good intro into the larger ideas.
Still working through the individual issues back in the codemetar repo and hope to have them done soon, but just wanted to chime in here while it is not too stale and say thanks!
Ok, with apologies for the delay, all edits have now been addressed. I've tried to summarize changes in issues under an onboarding milestone, there are summaries issue for all of the changes made in response to Toph and Anna. Thanks again for all the help!
Thanks, @cboettig! @toph-allen and @annakrystalli, please look over the changes and let us know if they address your concerns. If yes, just say so and check the appropriate boxes up on your review. If not, let us know that else @cboettig should follow up on.
Hi all, super happy with @cboettig responses and have checked the appropriate boxes. An offical Hello from the world
to codemetar
! 🚀
Thanks @annakrystalli!
@toph-allen can you have a look at @cboettig's work and answers and give a 👍 / :-1:? Thanks a lot in advance.
@maelle I’ll have comments shortly. Sorry for the delay.
Awesome @toph-allen thanks a lot in advance 😸
I'm sorry for the delay on this. I've looked through @cboettig's response, plus relevant updates to code & documentation, and it all looks great. I give a hearty 👍.
@cboettig, I really appreciate the specific & detailed responses. I'm happy to have been a part of the review process. Please let me know if there is anything else needed from me to proceed!
[I'm wrapping this up as @maelle is on vacation]
Thank you for your follow-ups @toph-allen and @annakrystalli! @cboettig, a few last things I am finding in final checks. I'll accept pending these changes:
Warnings ------------------------------------------------------------------------------
1. we can generate codemeta
from the root directory of R source code on github (@test-cloned-repo.R#40) - 'devtools::use_build_ignore' is deprecated.
Use 'usethis::use_build_ignore()' instead.
Fix these and we can move on to onboarding. Here's the checklist for after:
To-dos:
[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
@noamross Thanks! Looks like usethis
is not yet on CRAN. I think you must be running GitHub versions of devtools and usethis to get that warning?
Shall I go ahead and do the transfer and those subsequent steps now?
Ah, I am using development devtools. Disregard. Yes, transfer, but please do fix the text in the vignette first.
@noamross thanks! Okay, I htink I've fixed that first vignette now as well (https://github.com/codemeta/codemetar/commit/57caa466f25724be5b2b9bc04e53829831304d49). Transferring now.
Approved! Thanks @toph-allen and @annakrystalli for your reviews and @cboettig for your work.
travis and codecov seem to be good, appveyor is tied to users, not github accounts, so you can add to yours Carl, I also just added to mine if you want to work off that one https://ci.appveyor.com/project/sckott/codemetar
Seems the badge at https://github.com/ropensci/codemetar still says "Under Review", but the badge at https://ropensci.github.io/codemetar/ says "Peer Reviewed" (at least for me). Looks like a caching issue -- can we change the expiration time on the badge for the browser cache?
Pinging @karthik. I think we finally fixed the issue of github caching the badges, but I guess this is another header variable to change for browser cache?
(I see "peer reviewed" everywhere, myself)
@cboettig you haven't submitted the package to JOSS yet, have you?
@maelle no, thanks for the ping, I'll try that now. anything special I need to do re connecting to the rOpenSci review?
okay, I believe I've initiated submission with http://joss.theoj.org/papers/b16b968425f513da31632b0edd8ff0c5.
Perfect, I'll watch the new submissions at JOSS to comment in the Github issue right away to be sure it's flagged as rOpenSci-reviewed.
Closing this since everything is going well over at JOSS!
Summary
Codemeta defines a 'JSON-LD' format for describing software metadata. This package provides utilities to generate, parse, and modify codemeta.jsonld files automatically for R packages.
URL for the package (the development repository, not a stylized html page): https://github.com/codemeta/codemetar
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
Package for creating and working with scientific software metadata
Academic researchers looking to create metadata for their software
Nope
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
with a high-level description in the package root or ininst/
.Detail
[x] Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:[x] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
A few lines of code handle exceptional cases that are difficult to cover in unit tests. Otherwise no there should be no outstanding
goodpractice
flags in the repo.If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Perhaps anyone interested in metadata/archiving or just in messing around with json, maybe:
@amoeba @gothub @jennybc @sckott @stephlocke @jeroen @mfenner @o2r-project