frictionless: Read and Write Frictionless Data Packages

peterdesmet commented 2 years ago

Date accepted: 2022-02-10 Submitting Author Name: Peter Desmet Submitting Author Github Handle: !--author1-->@peterdesmet@damianooldoni<!--end-author-others-- Repository: https://github.com/frictionlessdata/frictionless-r Version submitted: 0.9.0 Submission type: Standard

Editor: !--editor-->@melvidoni<!--end-editor-- Reviewers: @zambujo, @beatrizmilz

Due date for @zambujo: 2022-02-06 Due date for @beatrizmilz: 2022-02-09

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: frictionless
Title: Read and Write Frictionless Data Packages
Version: 0.9.0.9000
Authors@R: c(
    person("Peter", "Desmet", , "peter.desmet@inbo.be", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0002-8442-8025")),
    person("Damiano", "Oldoni", , "damiano.oldoni@inbo.be", role = "aut",
           comment = c(ORCID = "0000-0003-3445-7562")),
    person("Research Institute for Nature and Forest (INBO)", , , 
           "info@inbo.be", role = c("cph"))
  )
Description: Read and write Frictionless Data Packages. A Data Package 
  (<https://specs.frictionlessdata.io/data-package/>) is a simple container 
  format and standard to describe and package a collection of (tabular) data. 
  It is typically used to publish FAIR and open datasets.
License: MIT + file LICENSE
URL: https://github.com/frictionlessdata/frictionless-r,
    https://frictionlessdata.github.io/frictionless-r/
BugReports: https://github.com/frictionlessdata/frictionless-r/issues
Imports:
    assertthat,
    dplyr,
    glue,
    httr,
    jsonlite,
    purrr,
    readr (>= 2.1.0),
    stringr
Suggests:
    knitr,
    hms,
    lubridate,
    testthat (>= 3.0.0),
    rmarkdown
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
VignetteBuilder: knitr

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [x] data retrieval
- [x] data extraction
- [x] data munging
- [x] data deposition
- [x] data validation and testing (listed as category, but not in issue template)
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

frictionless allows users to read and write Frictionless Data Packages, an open and general-purpose standard to structure and describe (tabular) datasets, typically used to publish FAIR datasets. The package allows users to read (local and remote) Data Packages (data retrieval), load its data resources in data frames (data extraction), return errors if the Data Package is malformed (data validation and testing), add data frames as new resources (data munging) and write Data Packages back to disk (Data deposition).

Who is the target audience and what are scientific applications of this package?

Anyone who wants to read or create datasets structured as Frictionless Data Packages. The community is referred to as the Frictionless Data community and typical includes researchers, data scientists and data engineers, often interested in (publishing) open data.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

Yes, datapackage.r: it has an object-oriented design (using a Package class) and offers validation. frictionless on the other hand allows users to quickly read and write Data Package data to and from R data frames, getting out of your way for the rest of your analysis. It is designed to be lightweight, follows tidyverse principles and supports piping. The main functionality (reading data into data frame, adding a data frame as a resource to a package, writing a Data Package to disk) is offered as functions, rather than the class properties in datapackage.r.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

Not applicable

If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Not applicable

Technical checks

Confirm each of the following by checking the box.

[x] I have read the guide for authors and rOpenSci packaging guide.

Note that the link to guide for authors above (in the issue template) returns a 404. It should be https://devguide.ropensci.org/authors-guide.html. I tried to use pkgcheck but I got package ‘pkgcheck’ is not available for this version of R

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions, created with roxygen2.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Note that this package falls under the Frictionless Data Code of Conduct.

ropensci-review-bot commented 2 years ago

Missing values: author1, repourl, submission-type, language

peterdesmet commented 2 years ago

@ropensci-review-bot I have now included the missing <!--> tags in the issue body.

mpadge commented 2 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 2 years ago

Thanks, about to send the query.

ropensci-review-bot commented 2 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 2 years ago

Checks for frictionless (v0.9.0.9000)

git hash: dc9daa6a

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'CITATION' file.
:heavy_multiplication_x: does not have a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 100%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: MIT + file LICENSE

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 15 files) and - 2 authors - 1 vignette - 1 internal data file - 8 imported packages - 8 exported functions (median 18 lines of code) - 26 non-exported functions in R (median 24 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 15| 73.0| | |files_vignettes | 1| 68.4| | |files_tests | 10| 90.7| | |loc_R | 528| 49.2| | |loc_vignettes | 119| 31.1| | |loc_tests | 1192| 89.2| | |num_vignettes | 1| 64.8| | |data_size_total | 1364| 61.3| | |data_size_median | 1364| 66.0| | |n_fns_r | 34| 44.1| | |n_fns_r_exported | 8| 38.3| | |n_fns_r_not_exported | 26| 48.5| | |n_fns_per_file_r | 1| 21.7| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 20| 59.8| | |loc_per_fn_r_exp | 18| 42.5| | |loc_per_fn_r_not_exp | 24| 70.4| | |rel_whitespace_R | 13| 42.4| | |rel_whitespace_vignettes | 41| 38.1| | |rel_whitespace_tests | 13| 80.4| | |doclines_per_fn_exp | 30| 34.5| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 46| 64.6| | ---

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/frictionlessdata/frictionless-r/workflows/R-CMD-check/badge.svg)](https://github.com/frictionlessdata/frictionless-r/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:--------------------------|:----------|:------|:----------| |pages build and deployment |success |96a3d1 |2022-01-03 | |pkgdown |success |dc9daa |2022-01-03 | |R-CMD-check |success |dc9daa |2022-01-03 | |test-coverage |success |dc9daa |2022-01-03 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 100 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 12 potential issues: message | number of times --- | --- Lines should not be more than 80 characters. | 12

Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.3.59 | |pkgcheck |0.0.2.205 |

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

stefaniebutland commented 2 years ago

@lwinfree just fyi, here's the (start of the) rOpenSci software peer review thread for the "frictionless" R package. Folks, Lilly Winfree is Product Manager @ Frictionless Data.

peterdesmet commented 2 years ago

A codemeta.json file has now been added.

ldecicco-USGS commented 2 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 2 years ago

Thanks, about to send the query.

ropensci-review-bot commented 2 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 2 years ago

Checks for frictionless (v0.9.0.9000)

git hash: 794ca7f6

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'CITATION' file.
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 100%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Package License: MIT + file LICENSE

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 15 files) and - 2 authors - 1 vignette - 1 internal data file - 8 imported packages - 8 exported functions (median 18 lines of code) - 26 non-exported functions in R (median 24 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 15| 73.0| | |files_vignettes | 1| 68.4| | |files_tests | 10| 90.7| | |loc_R | 528| 49.2| | |loc_vignettes | 119| 31.1| | |loc_tests | 1192| 89.2| | |num_vignettes | 1| 64.8| | |data_size_total | 1364| 61.3| | |data_size_median | 1364| 66.0| | |n_fns_r | 34| 44.1| | |n_fns_r_exported | 8| 38.3| | |n_fns_r_not_exported | 26| 48.5| | |n_fns_per_file_r | 1| 21.7| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 20| 59.8| | |loc_per_fn_r_exp | 18| 42.5| | |loc_per_fn_r_not_exp | 24| 70.4| | |rel_whitespace_R | 13| 42.4| | |rel_whitespace_vignettes | 41| 38.1| | |rel_whitespace_tests | 13| 80.4| | |doclines_per_fn_exp | 30| 34.5| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 46| 64.6| | ---

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/frictionlessdata/frictionless-r/workflows/R-CMD-check/badge.svg)](https://github.com/frictionlessdata/frictionless-r/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:--------------------------|:----------|:------|:----------| |pages build and deployment |success |48cca6 |2022-01-04 | |pkgdown |success |794ca7 |2022-01-04 | |R-CMD-check |success |794ca7 |2022-01-04 | |test-coverage |success |794ca7 |2022-01-04 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 100 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 12 potential issues: message | number of times --- | --- Lines should not be more than 80 characters. | 12

Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.3.72 | |pkgcheck |0.0.2.205 |

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

mpadge commented 2 years ago

@peterdesmet Thanks for the submission - an editor will be assigned as soon as possible, but it may take a few days.

mpadge commented 2 years ago

@ropensci-review-bot assign @melvidoni as editor

ropensci-review-bot commented 2 years ago

Assigned! @melvidoni is now the editor

melvidoni commented 2 years ago

Hello @peterdesmet, I'll be the handling editor. I'll start looking for reviewers, and let you know once they are assigned. Please, bare with me for a bit.

peterdesmet commented 2 years ago

Hi @melvidoni, 2 questions:

I'm about to merge a PR with updated functionality into the package. Would it be ok if the reviewers review the resulting 0.10.0 version?
Can I add the peer review badge?

melvidoni commented 2 years ago

Hello @peterdesmet . 1) They will. None of those contacted replied yet, so they will review the latest once they accept. 2) Not yet, once the reviewing process has finished.

melvidoni commented 2 years ago

@ropensci-review-bot assign @zambujo as reviewer

ropensci-review-bot commented 2 years ago

@zambujo added to the reviewers list. Review due date is 2022-02-06. Thanks @zambujo for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@zambujo: If you haven't done so, please fill this form for us to update our reviewers records.

melvidoni commented 2 years ago

Hello @peterdesmet I'm still searching for another reviewer. The reviewing deadline for @zambujo is 2022-02-06

peterdesmet commented 2 years ago

@melvidoni @zambujo Thanks!

Version 0.10.0 of the package has just been released, which would be the preferred version for review.

melvidoni commented 2 years ago

Version 0.10.0 of the package has just been released, which would be the preferred version for review.

Yes, that would be the version to review. Could you please make the link clearer and/or merge to master?

peterdesmet commented 2 years ago

@melvidoni version 0.10.0 has been merged to the default branch (main), but that branch is also used for further development.

To install 0.10.0 specifically (recommended):

devtools::install_github("frictionlessdata/frictionless-r@v0.10.0")

To install the latest development version (0.10.0.9000):

devtools::install_github("frictionlessdata/frictionless-r")

melvidoni commented 2 years ago

@ropensci-review-bot assign @beatrizmilz as reviewer

ropensci-review-bot commented 2 years ago

@beatrizmilz added to the reviewers list. Review due date is 2022-02-09. Thanks @beatrizmilz for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@beatrizmilz: If you haven't done so, please fill this form for us to update our reviewers records.

beatrizmilz commented 2 years ago

Hi! Thanks for inviting. Here is my review:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 3h

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Congratulations to the authors. Everything worked well. The comments I wrote were mostly ideas that seems to me that could improve the user experience, but the package looks great as it is. Nice testing suite!

README

In the README, the authors use an abbreviation (FAIR). It might be good to clarify what is about (Findability, Accessibility, Interoperability, and Reusability) for readers that are not introduced to the concept of FAIR. Also, what do the authors think about linking a text to clarify it?

Vignette

In the vignette, the pipe from magrittr is used but it was not loaded. So if someone is trying to follow the code in the vignette, there is an could not find function "%>%" error.
Just for the record: When executing library(frictionless) there is a message saying that The following object is masked from ‘package:usethis’: create_package: https://usethis.r-lib.org/reference/create_package.html
There is some examples that are shown using lists. What do the authors think about using functions instead, to be more standardized? Example: create a function resources(package) to use it instead of package$resource_names.
Adding the descriptions to the schema does not seem trivial. There is an example with the purrr package. But the example might be not simple to understand if someone is not used to the purrr package.

I`m talking about this piece of code:

iris_schema <- create_schema(iris)

# Remove description for first field
iris_schema$fields[[1]]$description <- NULL

# Set descriptions for all fields
descriptions <- c(
  "Sepal length in cm.",
  "Sepal width in cm.",
  "Pedal length in cm.",
  "Pedal width in cm.",
  "Iris species."
)
iris_schema$fields <- purrr::imap(
  iris_schema$fields,
  ~ c(.x, description = descriptions[.y])
)

Do the authors think that it is possible to create a function to add descriptions to the schema, in a way that is used in a similarly to the other functions of the package? Example of the idea:

iris_schema <- create_schema(iris) |>
  add_description(
    c(
      "Sepal length in cm.",
      "Sepal width in cm.",
      "Pedal length in cm.",
      "Pedal width in cm.",
      "Iris species."
    )
  )

The vignette says that in order to validate the Data Package, the users need to use Python. Is there any plans to implement that funcionality in R aswell?

Validate your Data Package before depositing. You can do this in Python with the Frictionless Framework using frictionless validate datapackage.json.

In the vignette, there is some instructions to zip the csv to reduce size of the files:

Zip the individual csv files (and update their paths in datapackage.json) to reduce size, not the entire Data Package. That way, users still have direct access to the datapackage.json file. See this example.

Do the authors think that is a good idea to add an argument on write_package() to write the compressed csvs, to facilitate this step? Something like:

write_package(my_package, "my_directory", compress = TRUE)

Since the function uses readr::write_csv(), if compress = TRUE it could add .gz at the end of the filepath and readr would zip it on the fly.

Example:

readr::write_csv(mtcars, "mtcars.csv")
file.size("mtcars.csv")
#> [1] 1281

readr::write_csv(mtcars, "mtcars_zipped.csv.gz")
file.size("mtcars_zipped.csv.gz")
#> [1] 558

I hope the review is usefull. Again, congratulations for the authors.

Full report is here: https://github.com/beatrizmilz/ropensci_reviews/blob/main/frictionless/review.md

peterdesmet commented 2 years ago

Thanks @beatrizmilz for your review! My feedback:

README: FAIR -> Will link to https://www.go-fair.org/fair-principles/ and not expand in text, so sentence remains readable.
Vignette: Right, %>% was loaded via dplyr in the hidden first code chunk of the vignette, but that is not visible to users. Will adapt. I'm opting to load it via dplyr, since that package is already a dependency.
Masking of usethis::create_package(): Yeah, it is a bit unfortunate that the term package is used for different things in Frictionless vs R (as explained at the start of the vignette). Luckily in R it is often referred to as pkg in function names, reducing masking. In the Frictionless Community "Data Package" does seem to be consistently referred to as package in implementations in other languages, not dp, seldom as datapackage, which is why I adopted that term for frictionless functions and parameters. I think alternatives like create_datapackage(), create_data_package(), create_dataset() are less desirable, but 👉 feedback welcome 👈. Was the term package confusing in any way?
Add resources(package) function: Good idea, added as todo in https://github.com/frictionlessdata/frictionless-r/issues/97
Add add_description() function: Agree that this is not trivial with purrr, which is why included an example (also as a reminder for myself how to do it ☺️). There is already an issue that discusses a dedicated function to add field properties (description is one of those). I added your suggestion there: https://github.com/frictionlessdata/frictionless-r/issues/70 Might be implemented in a future release.
Package validation: "The vignette says that in order to validate the Data Package, the users need to use Python. Is there any plans to implement that funcionality in R aswell?" Data Package validation is vast and must perform at scale, so this is a daunting task. 😅 Would be cool to have it in frictionless, but I currently have no plans to include this.
Zipped resources: Since v0.10.0, add_resource() supports adding (zipped) CSVs directly from disk, but that is indeed not the case for added data frames. So write_package(my_package, "my_directory", compress = TRUE) is a cool idea, and likely sufficient to support at a package level (rather than deciding for each resource). Added as todo in https://github.com/frictionlessdata/frictionless-r/issues/98

zambujo commented 2 years ago

Dear all, apologies for the delay. Please find my comments below:

Package Review

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 4h

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

The package integrates the frictionlessdata framework library collection which facilitates the packaging of tabular text data along with their schemas across different programming languages. The framework provides a set of tools intended to facilitate the creation of "FAIR-compliant" datasets. The umbrella project is led by the Open Knowledge Foundation and the framework is most known for its command-line tool written in Python.

Regarding the R package, the documentation is well organised and complete. The same applies to the unit tests. (Apropos, I like seeing how the authors handle errors with abundant assertions directly on the main code.) I was unable to find any relevant issues and have only a few minor optional suggestions as well as some open points/questions for discussion. All in all, the package has been beautifully crafted. Well done!

Optional suggestions/questions:

(in no particular order)

check_path(path) (in utils.R): would it make sense to include other protocols other than http by using something like grepl("^http://|^https://|^ftp://|sftp://", path) - instead of starts_with("http")
in unique_sorted() (in utils.R), I was wondering whether stats::aggregate() is necessary. Intuitively I would have used table() from base R: names(sort(table(x), decreasing = TRUE)), provided it passes the unit test.
Is there a reason to pass a default value to the file name parameter in read_package()? Certain functions such as bmp() provide names by default with automatic file numbering, but I think it is more common not to provide any default value.
would it make sense to replace {readr} with {vroom} to improve reading and writing times for large files?
if I understand correctly, when creating a data package, factors are converted to strings of characters. Would it make sense to package extra dedicated lookup tables to account for the information contained factor levels and their order?
in the future, would it make sense to develop {frictionless-r} more towards being a frictionless-py wrapper (somewhat similarly to spacyr/spacy), to make it easier to keep the R package in sync with the Python project?

melvidoni commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/495#issuecomment-1025860861 time 3

ropensci-review-bot commented 2 years ago

Logged review for beatrizmilz (hours: 3)

melvidoni commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/495#issuecomment-1033049584 time 4

ropensci-review-bot commented 2 years ago

Logged review for zambujo (hours: 4)

melvidoni commented 2 years ago

Thank you both @beatrizmilz and @zambujo for the thoughtful reviews!

@peterdesmet please proceed with the outstanding changes whenever you have time. I'll ask both reviewers to stay tuned to see how your changes are being addressed.

peterdesmet commented 2 years ago

Thanks @zambujo for your review. My feedback:

Other protocols for check_path(): Other protocols like FTP could indeed be implemented, but the specs for "URL or path" states that only http, https, and local POSIX paths are allowed. It makes sense though to allow (S)FTP, so have asked the Frictionless community for guidance.
Using table() in unique_sorted(): Nice! More elegant indeed. Only had to update to handle all NA_character_ values.
No default value for read_package(): Although it is unlikely that a datapackage.json will be present in the working directory, I'm tempted to keep it, because according to the specs a descriptor file must be named datapackage.json, so the default name hints that the user should provide a path to such a file.
vroom: {readr} 2.0.0 and up use {vroom} under the hood. frictionless-r requires readr >= 2.1.0 and thus {vroom}, so I don't think there is going to be a speed difference using {vroom} directly. Even though many functions in {readr} can be exchanged for {vroom} functions, I'll keep {readr} because 1) I communicate in the documentation that {readr} is used by some functions and that is a more well-known package to end users than {vroom} and 2) I rely on readr::guess_encoding().
factors: when reading a data package, str/integer/numeric values that have an enum in their schema are converted to factors, with the enum values as levels, in the order they are listed in enum. When writing a data package, factors keep their data type and the levels are written in an enum field. No (re)ordering is done.
```
{
  "name": "str_factor",
  "type": "string",
  "constraints": {
    "enum": ["foo", "bar"]
  }
},
{
  "name": "num_factor",
  "type": "number",
  "constraints": {
    "enum": [3.1, 3.2, 3.3]
  }
},
{
  "name": "int_factor",
  "type": "integer",
  "constraints": {
    "enum": [3, 4, -1]
  }
}
```
r package as wrapper around Python: An option indeed, but:
1. Currently out of my area of expertise
2. Python dev is going fast and currently a bit of a moving target
3. The main target to keep in sync with are the specs and they luckily don't change that fast.
4. In my opinion the previous R package {datapackage.r} suffers from being un-R-like, introducing OO concepts that are foreign to most R users, which might happen again on a wrapper. However, for Data Package validation specifically (currently not in scope for {frictionless-r}), wrapping around the Python toolbox is likely useful!

zambujo commented 2 years ago

Many thanks @peterdesmet. You have addressed all my comments and questions. Impressive work!

Ps. I have to confess that I had to update my packages to be able to review frictionless-r. Incidentally, I did notice a huge improvement in the performance of {readr} when I ran some code this morning. 🤓

peterdesmet commented 2 years ago

Thanks @zambujo. The suggested change for unique_values() is implemented in https://github.com/frictionlessdata/frictionless-r/pull/101.

@melvidoni, the comments suggested by @beatrizmilz are addressed in https://github.com/ropensci/software-review/issues/495#issuecomment-1025957514 and where actionable, all implemented in the latest version of the package. Both reviewers were included with rev roles: thanks to you both!

One lingering question I have for the reviewers is the use of the word package. I'm copy/pasting my question from higher up:

Masking of usethis::create_package(): Yeah, it is a bit unfortunate that the term package is used for different things in Frictionless vs R (as explained at the start of the vignette). Luckily in R it is often referred to as pkg in function names, reducing masking. In the Frictionless Community "Data Package" does seem to be consistently referred to as package in implementations in other languages, not dp, seldom as datapackage, which is why I adopted that term for frictionless functions and parameters. I think alternatives like create_datapackage(), create_data_package(), create_dataset() are less desirable, but 👉 feedback welcome 👈. Was the term package confusing in any way?

Since you both didn't remark on that, I assume that the word package was not confusing in read_package(), create_package() or write_package(), but I want to make sure.

melvidoni commented 2 years ago

Okay, given that @zambujo gave the okay, we are only missing @beatrizmilz's comments on the latest changes, and the answer for your question. Let's wait for her, then.

beatrizmilz commented 2 years ago

Hi! Peter, the word package was not confusing for me since there was an explaination in the documentation! I pointed out about the masking with usethis because for users that uses only library() and are not familiar with the possibility of conflict between functions with the same names, can eventually encounter errors and ask for help (and that is not a problem with the package!). I think i thought about that because I answer a lot of questions in foruns in portuguese, and is frequent questions about errors caused by masking.

You have addressed all the questions, and as @zambujo said, this is an impressive work. Congratulations!

melvidoni commented 2 years ago

@ropensci-review-bot approve frictionless

ropensci-review-bot commented 2 years ago

Approved! Thanks @peterdesmet for submitting and @zambujo, @beatrizmilz for your reviews! :grin:

To-dos:

[ ] Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so.
[ ] After transfer write a comment @ropensci-review-bot finalize transfer of <package-name> where <package-name> is the repo/package name. This will give you admin access back.
[ ] Fix all links to the GitHub repo to point to the repo under the ropensci organization.
[ ] Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
[ ] If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
- deactivate the automatic deployment you might have set up
- remove styling tweaks from your pkgdown config but keep that config file
- replace the whole current pkgdown website with a redirecting page
- replace your package docs URL with https://docs.ropensci.org/package_name
- In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
[ ] Fix any links in badges for CI and coverage to point to the new repository URL.
[ ] Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant.
[ ] We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/ for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
[ ] You can add this installation method to your package README install.packages("<package-name>", repos = "https://ropensci.r-universe.dev") thanks to R-universe.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

peterdesmet commented 2 years ago

@melvidoni Is it required to transfer the {frictionless-r} repository to rOpenSci? Because this is also a question of branding. Ping @lwinfree @sapetti9 @roll from Frictionless Data.

[x] Package repository URL: I would prefer to keep https://github.com/frictionlessdata/frictionless-r .That way 1) it is clearer that it is endorsed/maintained by Frictionless Data and 2) it keeps living alongside other implementations of Frictionless Data standards, such Python and JS. -> Keep under frictionlessdata
[x] Package website: currently https://frictionlessdata.github.io/frictionless-r/ with generic pkgdown (Bootstrap v5). We could adopt the rOpenSci central docs building and branding, so it looks more like https://docs.ropensci.org/wateRinfo/. That will visually tie it to rOpenSci and required an update of its URL (including a redirect of the old URL). Branding wise, my opinion is that it is fine to adopt the rOpenSci branding, because I hope to add a logo to the package soon, which will visually tie it to Frictionless Data.
[x] rOpenSci package family: how can I get the package listed under https://ropensci.org/packages/all/ ?
[x] rOpenSci R universe: how can I get the package added to the rOpenSci R universe?
[x] Code of conduct: Which code of conduct should we adopt? Frictionless Data vs rOpenSci I think Frictionless Data, since the package is maintained there and conflicts should be resolved there. -> Use Frictionless

Once those questions are answered I can make the necessary changes and then hopefully submit to CRAN! 🎉🤞

TODO based on https://github.com/ropensci/software-review/issues/495#issuecomment-1034398725

[x] ~Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so.~
[x] ~After transfer write a comment @ropensci-review-bot finalize transfer of <package-name> where <package-name> is the repo/package name. This will give you admin access back.~
[x] ~Fix all links to the GitHub repo to point to the repo under the ropensci organization.~
[x] ~Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file~
[x] Customize sidebar so COC appears there.
[x] If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
- [x] deactivate the automatic deployment you might have set up
- [x] remove styling tweaks from your pkgdown config but keep that config file
- [x] replace the whole current pkgdown website with a redirecting page
- [x] replace your package docs URL with https://docs.ropensci.org/package_name
- [x] In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
[x] ~Fix any links in badges for CI and coverage to point to the new repository URL.~
[x] Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant. Done in https://frictionlessdata.github.io/frictionless-r/news/index.html
[x] We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/ for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
[x] You can add this installation method to your package README install.packages("<package-name>", repos = "https://ropensci.r-universe.dev") thanks to R-universe.

lwinfree commented 2 years ago

Hi all! First of all, it has been really lovely to watch this process unfold, so a big thank you to everyone that has been involved!

Speaking as product manager of Frictionless Data:

we would like to keep the repo under the Frictionless organization as Peter suggests
we would like to keep the Frictionless code of conduct to not confuse potential users if that is OK

everything else looks great to me!

Thanks!

peterdesmet commented 2 years ago

Thanks @lwinfree!

@melvidoni What would be the instructions to do the remaining points in https://github.com/ropensci/software-review/issues/495#issuecomment-1034672825 i.e. using the rOpenSci CI and website building for a repo not under rOpenSci (cf. https://github.com/CornellLabofOrnithology/auk/)?

melvidoni commented 2 years ago

Hello all. Please, bear with me while I discuss with the other Associate Editors. In the meantime, complete what you can, please.

melvidoni commented 2 years ago

Update @peterdesmet @lwinfree. We are discussing the CoC issue. Will get back to you soon-ish, please bear with us.

maelle commented 2 years ago

https://docs.ropensci.org/frictionless/ is now up. It means the package has been added to rOpenSci R-universe https://ropensci.r-universe.dev/ui#builds For reference, here's how it was "registered": https://github.com/ropensci-org/makeregistry/commit/31e1638913977f508ef0330879b8cadb45c58f13
it is fine it links to the frictionless COC. You might want to add that COC to https://github.com/frictionlessdata/frictionless-r/tree/main/.github to have it appear in the sidebar (or customize the sidebar by other means see https://pkgdown.r-lib.org/reference/build_home.html#sidebar).
the package will soon also be on our website, whose registry is built once a day.
Regarding redirects we have some guidance at https://devguide.ropensci.org/redirect.html (and if it's unclear please tell us so that we might clarify it!)

Thanks for your work on this package. :smile_cat:

maelle commented 2 years ago

screenshot from rOpenSci website where one gets a card for the frictionless package

https://ropensci.org/packages/all/

ropensci / software-review