ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

frictionless: Read and Write Frictionless Data Packages #495

Closed peterdesmet closed 2 years ago

peterdesmet commented 2 years ago

Date accepted: 2022-02-10 Submitting Author Name: Peter Desmet Submitting Author Github Handle: !--author1-->@peterdesmet<!--end-author1-- Other Package Authors Github handles: !--author-others-->@damianooldoni<!--end-author-others-- Repository: https://github.com/frictionlessdata/frictionless-r Version submitted: 0.9.0 Submission type: Standard

Editor: !--editor-->@melvidoni<!--end-editor-- Reviewers: @zambujo, @beatrizmilz

Due date for @zambujo: 2022-02-06 Due date for @beatrizmilz: 2022-02-09

Archive: TBD Version accepted: TBD Language: en


Package: frictionless
Title: Read and Write Frictionless Data Packages
Version: 0.9.0.9000
Authors@R: c(
    person("Peter", "Desmet", , "peter.desmet@inbo.be", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0002-8442-8025")),
    person("Damiano", "Oldoni", , "damiano.oldoni@inbo.be", role = "aut",
           comment = c(ORCID = "0000-0003-3445-7562")),
    person("Research Institute for Nature and Forest (INBO)", , , 
           "info@inbo.be", role = c("cph"))
  )
Description: Read and write Frictionless Data Packages. A Data Package 
  (<https://specs.frictionlessdata.io/data-package/>) is a simple container 
  format and standard to describe and package a collection of (tabular) data. 
  It is typically used to publish FAIR and open datasets.
License: MIT + file LICENSE
URL: https://github.com/frictionlessdata/frictionless-r,
    https://frictionlessdata.github.io/frictionless-r/
BugReports: https://github.com/frictionlessdata/frictionless-r/issues
Imports:
    assertthat,
    dplyr,
    glue,
    httr,
    jsonlite,
    purrr,
    readr (>= 2.1.0),
    stringr
Suggests:
    knitr,
    hms,
    lubridate,
    testthat (>= 3.0.0),
    rmarkdown
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
VignetteBuilder: knitr

Scope

frictionless allows users to read and write Frictionless Data Packages, an open and general-purpose standard to structure and describe (tabular) datasets, typically used to publish FAIR datasets. The package allows users to read (local and remote) Data Packages (data retrieval), load its data resources in data frames (data extraction), return errors if the Data Package is malformed (data validation and testing), add data frames as new resources (data munging) and write Data Packages back to disk (Data deposition).

Anyone who wants to read or create datasets structured as Frictionless Data Packages. The community is referred to as the Frictionless Data community and typical includes researchers, data scientists and data engineers, often interested in (publishing) open data.

Yes, datapackage.r: it has an object-oriented design (using a Package class) and offers validation. frictionless on the other hand allows users to quickly read and write Data Package data to and from R data frames, getting out of your way for the rest of your analysis. It is designed to be lightweight, follows tidyverse principles and supports piping. The main functionality (reading data into data frame, adding a data frame as a resource to a package, writing a Data Package to disk) is offered as functions, rather than the class properties in datapackage.r.

Not applicable

Not applicable

Technical checks

Confirm each of the following by checking the box.

Note that the link to guide for authors above (in the issue template) returns a 404. It should be https://devguide.ropensci.org/authors-guide.html. I tried to use pkgcheck but I got package ‘pkgcheck’ is not available for this version of R

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

Note that this package falls under the Frictionless Data Code of Conduct.

ropensci-review-bot commented 2 years ago

Missing values: author1, repourl, submission-type, language

peterdesmet commented 2 years ago

@ropensci-review-bot I have now included the missing <!--> tags in the issue body.

mpadge commented 2 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 2 years ago

Thanks, about to send the query.

ropensci-review-bot commented 2 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 2 years ago

Checks for frictionless (v0.9.0.9000)

git hash: dc9daa6a

Important: All failing checks above must be addressed prior to proceeding

Package License: MIT + file LICENSE


1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 15 files) and - 2 authors - 1 vignette - 1 internal data file - 8 imported packages - 8 exported functions (median 18 lines of code) - 26 non-exported functions in R (median 24 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 15| 73.0| | |files_vignettes | 1| 68.4| | |files_tests | 10| 90.7| | |loc_R | 528| 49.2| | |loc_vignettes | 119| 31.1| | |loc_tests | 1192| 89.2| | |num_vignettes | 1| 64.8| | |data_size_total | 1364| 61.3| | |data_size_median | 1364| 66.0| | |n_fns_r | 34| 44.1| | |n_fns_r_exported | 8| 38.3| | |n_fns_r_not_exported | 26| 48.5| | |n_fns_per_file_r | 1| 21.7| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 20| 59.8| | |loc_per_fn_r_exp | 18| 42.5| | |loc_per_fn_r_not_exp | 24| 70.4| | |rel_whitespace_R | 13| 42.4| | |rel_whitespace_vignettes | 41| 38.1| | |rel_whitespace_tests | 13| 80.4| | |doclines_per_fn_exp | 30| 34.5| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 46| 64.6| | ---

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


2. goodpractice and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/frictionlessdata/frictionless-r/workflows/R-CMD-check/badge.svg)](https://github.com/frictionlessdata/frictionless-r/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:--------------------------|:----------|:------|:----------| |pages build and deployment |success |96a3d1 |2022-01-03 | |pkgdown |success |dc9daa |2022-01-03 | |R-CMD-check |success |dc9daa |2022-01-03 | |test-coverage |success |dc9daa |2022-01-03 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 100 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 12 potential issues: message | number of times --- | --- Lines should not be more than 80 characters. | 12


Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.3.59 | |pkgcheck |0.0.2.205 |


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

stefaniebutland commented 2 years ago

@lwinfree just fyi, here's the (start of the) rOpenSci software peer review thread for the "frictionless" R package. Folks, Lilly Winfree is Product Manager @ Frictionless Data.

peterdesmet commented 2 years ago

A codemeta.json file has now been added.

ldecicco-USGS commented 2 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 2 years ago

Thanks, about to send the query.

ropensci-review-bot commented 2 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 2 years ago

Checks for frictionless (v0.9.0.9000)

git hash: 794ca7f6

Package License: MIT + file LICENSE


1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 15 files) and - 2 authors - 1 vignette - 1 internal data file - 8 imported packages - 8 exported functions (median 18 lines of code) - 26 non-exported functions in R (median 24 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 15| 73.0| | |files_vignettes | 1| 68.4| | |files_tests | 10| 90.7| | |loc_R | 528| 49.2| | |loc_vignettes | 119| 31.1| | |loc_tests | 1192| 89.2| | |num_vignettes | 1| 64.8| | |data_size_total | 1364| 61.3| | |data_size_median | 1364| 66.0| | |n_fns_r | 34| 44.1| | |n_fns_r_exported | 8| 38.3| | |n_fns_r_not_exported | 26| 48.5| | |n_fns_per_file_r | 1| 21.7| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 20| 59.8| | |loc_per_fn_r_exp | 18| 42.5| | |loc_per_fn_r_not_exp | 24| 70.4| | |rel_whitespace_R | 13| 42.4| | |rel_whitespace_vignettes | 41| 38.1| | |rel_whitespace_tests | 13| 80.4| | |doclines_per_fn_exp | 30| 34.5| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 46| 64.6| | ---

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


2. goodpractice and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/frictionlessdata/frictionless-r/workflows/R-CMD-check/badge.svg)](https://github.com/frictionlessdata/frictionless-r/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:--------------------------|:----------|:------|:----------| |pages build and deployment |success |48cca6 |2022-01-04 | |pkgdown |success |794ca7 |2022-01-04 | |R-CMD-check |success |794ca7 |2022-01-04 | |test-coverage |success |794ca7 |2022-01-04 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 100 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 12 potential issues: message | number of times --- | --- Lines should not be more than 80 characters. | 12


Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.3.72 | |pkgcheck |0.0.2.205 |


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

mpadge commented 2 years ago

@peterdesmet Thanks for the submission - an editor will be assigned as soon as possible, but it may take a few days.

mpadge commented 2 years ago

@ropensci-review-bot assign @melvidoni as editor

ropensci-review-bot commented 2 years ago

Assigned! @melvidoni is now the editor

melvidoni commented 2 years ago

Hello @peterdesmet, I'll be the handling editor. I'll start looking for reviewers, and let you know once they are assigned. Please, bare with me for a bit.

peterdesmet commented 2 years ago

Hi @melvidoni, 2 questions:

  1. I'm about to merge a PR with updated functionality into the package. Would it be ok if the reviewers review the resulting 0.10.0 version?
  2. Can I add the peer review badge? rOpenSci
melvidoni commented 2 years ago

Hello @peterdesmet . 1) They will. None of those contacted replied yet, so they will review the latest once they accept. 2) Not yet, once the reviewing process has finished.

melvidoni commented 2 years ago

@ropensci-review-bot assign @zambujo as reviewer

ropensci-review-bot commented 2 years ago

@zambujo added to the reviewers list. Review due date is 2022-02-06. Thanks @zambujo for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@zambujo: If you haven't done so, please fill this form for us to update our reviewers records.

melvidoni commented 2 years ago

Hello @peterdesmet I'm still searching for another reviewer. The reviewing deadline for @zambujo is 2022-02-06

peterdesmet commented 2 years ago

@melvidoni @zambujo Thanks!

Version 0.10.0 of the package has just been released, which would be the preferred version for review.

melvidoni commented 2 years ago

Version 0.10.0 of the package has just been released, which would be the preferred version for review.

Yes, that would be the version to review. Could you please make the link clearer and/or merge to master?

peterdesmet commented 2 years ago

@melvidoni version 0.10.0 has been merged to the default branch (main), but that branch is also used for further development.

To install 0.10.0 specifically (recommended):

devtools::install_github("frictionlessdata/frictionless-r@v0.10.0")

To install the latest development version (0.10.0.9000):

devtools::install_github("frictionlessdata/frictionless-r")
melvidoni commented 2 years ago

@ropensci-review-bot assign @beatrizmilz as reviewer

ropensci-review-bot commented 2 years ago

@beatrizmilz added to the reviewers list. Review due date is 2022-02-09. Thanks @beatrizmilz for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@beatrizmilz: If you haven't done so, please fill this form for us to update our reviewers records.

beatrizmilz commented 2 years ago

Hi! Thanks for inviting. Here is my review:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 3h


Review Comments

Congratulations to the authors. Everything worked well. The comments I wrote were mostly ideas that seems to me that could improve the user experience, but the package looks great as it is. Nice testing suite!

README

Vignette

I`m talking about this piece of code:

iris_schema <- create_schema(iris)

# Remove description for first field
iris_schema$fields[[1]]$description <- NULL

# Set descriptions for all fields
descriptions <- c(
  "Sepal length in cm.",
  "Sepal width in cm.",
  "Pedal length in cm.",
  "Pedal width in cm.",
  "Iris species."
)
iris_schema$fields <- purrr::imap(
  iris_schema$fields,
  ~ c(.x, description = descriptions[.y])
)

Do the authors think that it is possible to create a function to add descriptions to the schema, in a way that is used in a similarly to the other functions of the package? Example of the idea:

iris_schema <- create_schema(iris) |>
  add_description(
    c(
      "Sepal length in cm.",
      "Sepal width in cm.",
      "Pedal length in cm.",
      "Pedal width in cm.",
      "Iris species."
    )
  )

Validate your Data Package before depositing. You can do this in Python with the Frictionless Framework using frictionless validate datapackage.json.

Zip the individual csv files (and update their paths in datapackage.json) to reduce size, not the entire Data Package. That way, users still have direct access to the datapackage.json file. See this example.

Do the authors think that is a good idea to add an argument on write_package() to write the compressed csvs, to facilitate this step? Something like:

write_package(my_package, "my_directory", compress = TRUE) 

Since the function uses readr::write_csv(), if compress = TRUE it could add .gz at the end of the filepath and readr would zip it on the fly.

Example:

readr::write_csv(mtcars, "mtcars.csv")
file.size("mtcars.csv")
#> [1] 1281

readr::write_csv(mtcars, "mtcars_zipped.csv.gz")
file.size("mtcars_zipped.csv.gz")
#> [1] 558

I hope the review is usefull. Again, congratulations for the authors.

Full report is here: https://github.com/beatrizmilz/ropensci_reviews/blob/main/frictionless/review.md

peterdesmet commented 2 years ago

Thanks @beatrizmilz for your review! My feedback:

zambujo commented 2 years ago

Dear all, apologies for the delay. Please find my comments below:

Package Review

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 4h


Review Comments

The package integrates the frictionlessdata framework library collection which facilitates the packaging of tabular text data along with their schemas across different programming languages. The framework provides a set of tools intended to facilitate the creation of "FAIR-compliant" datasets. The umbrella project is led by the Open Knowledge Foundation and the framework is most known for its command-line tool written in Python.

Regarding the R package, the documentation is well organised and complete. The same applies to the unit tests. (Apropos, I like seeing how the authors handle errors with abundant assertions directly on the main code.) I was unable to find any relevant issues and have only a few minor optional suggestions as well as some open points/questions for discussion. All in all, the package has been beautifully crafted. Well done!

Optional suggestions/questions:

(in no particular order)

melvidoni commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/495#issuecomment-1025860861 time 3

ropensci-review-bot commented 2 years ago

Logged review for beatrizmilz (hours: 3)

melvidoni commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/495#issuecomment-1033049584 time 4

ropensci-review-bot commented 2 years ago

Logged review for zambujo (hours: 4)

melvidoni commented 2 years ago

Thank you both @beatrizmilz and @zambujo for the thoughtful reviews!

@peterdesmet please proceed with the outstanding changes whenever you have time. I'll ask both reviewers to stay tuned to see how your changes are being addressed.

peterdesmet commented 2 years ago

Thanks @zambujo for your review. My feedback:

zambujo commented 2 years ago

Many thanks @peterdesmet. You have addressed all my comments and questions. Impressive work!

Ps. I have to confess that I had to update my packages to be able to review frictionless-r. Incidentally, I did notice a huge improvement in the performance of {readr} when I ran some code this morning. 🤓

peterdesmet commented 2 years ago

Thanks @zambujo. The suggested change for unique_values() is implemented in https://github.com/frictionlessdata/frictionless-r/pull/101.

@melvidoni, the comments suggested by @beatrizmilz are addressed in https://github.com/ropensci/software-review/issues/495#issuecomment-1025957514 and where actionable, all implemented in the latest version of the package. Both reviewers were included with rev roles: thanks to you both!

One lingering question I have for the reviewers is the use of the word package. I'm copy/pasting my question from higher up:

  • Masking of usethis::create_package(): Yeah, it is a bit unfortunate that the term package is used for different things in Frictionless vs R (as explained at the start of the vignette). Luckily in R it is often referred to as pkg in function names, reducing masking. In the Frictionless Community "Data Package" does seem to be consistently referred to as package in implementations in other languages, not dp, seldom as datapackage, which is why I adopted that term for frictionless functions and parameters. I think alternatives like create_datapackage(), create_data_package(), create_dataset() are less desirable, but 👉 feedback welcome 👈. Was the term package confusing in any way?

Since you both didn't remark on that, I assume that the word package was not confusing in read_package(), create_package() or write_package(), but I want to make sure.

melvidoni commented 2 years ago

Okay, given that @zambujo gave the okay, we are only missing @beatrizmilz's comments on the latest changes, and the answer for your question. Let's wait for her, then.

beatrizmilz commented 2 years ago

Hi! Peter, the word package was not confusing for me since there was an explaination in the documentation! I pointed out about the masking with usethis because for users that uses only library() and are not familiar with the possibility of conflict between functions with the same names, can eventually encounter errors and ask for help (and that is not a problem with the package!). I think i thought about that because I answer a lot of questions in foruns in portuguese, and is frequent questions about errors caused by masking.

You have addressed all the questions, and as @zambujo said, this is an impressive work. Congratulations!

melvidoni commented 2 years ago

@ropensci-review-bot approve frictionless

ropensci-review-bot commented 2 years ago

Approved! Thanks @peterdesmet for submitting and @zambujo, @beatrizmilz for your reviews! :grin:

To-dos:

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

peterdesmet commented 2 years ago

@melvidoni Is it required to transfer the {frictionless-r} repository to rOpenSci? Because this is also a question of branding. Ping @lwinfree @sapetti9 @roll from Frictionless Data.

Once those questions are answered I can make the necessary changes and then hopefully submit to CRAN! 🎉🤞

TODO based on https://github.com/ropensci/software-review/issues/495#issuecomment-1034398725

lwinfree commented 2 years ago

Hi all! First of all, it has been really lovely to watch this process unfold, so a big thank you to everyone that has been involved!

Speaking as product manager of Frictionless Data:

everything else looks great to me!

Thanks!

peterdesmet commented 2 years ago

Thanks @lwinfree!

@melvidoni What would be the instructions to do the remaining points in https://github.com/ropensci/software-review/issues/495#issuecomment-1034672825 i.e. using the rOpenSci CI and website building for a repo not under rOpenSci (cf. https://github.com/CornellLabofOrnithology/auk/)?

melvidoni commented 2 years ago

Hello all. Please, bear with me while I discuss with the other Associate Editors. In the meantime, complete what you can, please.

melvidoni commented 2 years ago

Update @peterdesmet @lwinfree. We are discussing the CoC issue. Will get back to you soon-ish, please bear with us.

maelle commented 2 years ago

Thanks for your work on this package. :smile_cat:

maelle commented 2 years ago

screenshot from rOpenSci website where one gets a card for the frictionless package

https://ropensci.org/packages/all/