ropensci / software-review

rOpenSci Software Peer Review.
286 stars 104 forks source link

xlcutter: Parse Batches of 'xlsx' Files Based on a Template #584

Closed Bisaloo closed 1 year ago

Bisaloo commented 1 year ago

Submitting Author Name: Hugo Gruson Submitting Author Github Handle: !--author1-->@Bisaloo<!--end-author1-- Repository: https://github.com/Bisaloo/xlcutter Version submitted: 0.1.0 Submission type: Standard Editor: TBD Reviewers: TBD

Archive: TBD Version accepted: TBD Language: en

Package: xlcutter
Title: Parse Batches of 'xlsx' Files Based on a Template
Version: 0.1.0
Authors@R: 
    person(
      "Hugo", "Gruson", , "hugo.gruson+R@normalesup.org", 
      role = c("aut", "cre", "cph"),
      comment = c(ORCID = "0000-0002-4094-1476")
    )
Description: Parse entire folders of non-rectangular 'xlsx' files into a single
  rectangular and tidy 'data.frame' based on a custom template file defining the 
  column names of the output.
License: MIT + file LICENSE
Config/testthat/edition: 3
Encoding: UTF-8
Language: en-GB
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Imports: 
    tidyxl
Suggests: 
    knitr,
    rmarkdown,
    testthat (>= 3.0.0)
VignetteBuilder: knitr
URL: https://github.com/Bisaloo/xlcutter, https://hugogruson.fr/xlcutter/
BugReports: https://github.com/Bisaloo/xlcutter/issues

Scope

This package provides a way to extract data from a large number of non-rectangular excel files based on a common template / format. It fills a gap in the software ecosystem, which usually focuses on already rectangular, or even tidy, data.

I expect this package to be of use in scientists, as well as non-scientist data users, in many areas. It is not linked to a specific domain area. It provides a generic way of parsing and importing a batch of excel (.xlsx) based on a user-defined template. I have already used this package in a collaboration with a hospital who stored patient data in non-rectangular excel files. Colleagues from field epidemiology have also expressed that such a tool would be very useful as many of their collaborators produce this kind of non-rectangular files. An important point is that this package explicitly aims at being usable by non-technical users because the template definition can be done in excel. But I believe it would also prove extremely useful to the most experienced R users, simplifying long custom parsing scripts into a single function call.

I don't know of any other package accomplishing the same thing.

Not applicable

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

ropensci-review-bot commented 1 year ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 1 year ago

:rocket:

Editor check started

:wave:

maurolepore commented 1 year ago

@ropensci-review-bot help

ropensci-review-bot commented 1 year ago

Hello @maurolepore, here are the things you can ask me to do:


# Add a review's info to the ROpenSci logs
@ropensci-review-bot submit review <REVIEW_URL> time <REVIEW_HOURS(ex. 10.5)>

# List all available commands
@ropensci-review-bot help

# Show our Code of Conduct
@ropensci-review-bot code of conduct

# Checks peer-review badge is in README.md
@ropensci-review-bot check readme

# Switch to 'seeking reviewers'
@ropensci-review-bot seeking reviewers

# Approves a package. This command will close the issue.
@ropensci-review-bot approve package-name

# Invite the author of a package to the corresponding rOpenSci team. This command should be issued by the author of the package.
@ropensci-review-bot invite me to ropensci/package-name

# Adds package's repo to the rOpenSci team. This command should be issued after approval and transfer of the package.
@ropensci-review-bot finalize transfer of package-name

# Mint package as [bronze/silver/gold]
@ropensci-review-bot mint silver

# Add a user to this issue's reviewers list
@ropensci-review-bot assign xxxxx as reviewer

# Remove a user from the reviewers list
@ropensci-review-bot remove xxxxx from reviewers

# Assign a user as the editor of this submission
@ropensci-review-bot assign @username as editor

# Put the submission on hold for the next 90 days
@ropensci-review-bot put on hold

# Remove the editor assigned to this submission
@ropensci-review-bot remove editor

# Change or add a review's due date for a reviewer
@ropensci-review-bot set due date for @reviewer to YYYY-MM-DD

# Close the issue
@ropensci-review-bot out of scope

# Various package checks
@ropensci-review-bot check package

# Checks srr documentation for stats packages
@ropensci-review-bot check srr
maurolepore commented 1 year ago

Thanks a lot @Bisaloo!

I can totally see how this package could be a life saver in many situations.

Beyond the brilliance of the idea, I'll discuss with the editor board how we interpret the fit of this package in our categories:

data extraction: Packages that aid in retrieving data from unstructured sources such as text.

I wonder how we collectively define "unstructured" relative to other data sources from which rOpenSci packages in this category typically extract data.

data munging: … This area does not include broad data manipulations tools such as reshape2 or tidyr …. Rather, it focuses on tools for handling data in specific scientific formats generated from scientific workflows or exported from scientific instruments.

I wonder what we collectively think of the scientific specificity/generality of this package, and what precedent we have of packages that have been accepted or deemed out of scope.

Whatever the outcome it seems you're on to something really neat and I encourage you to make it shine.

I'll come back to you.

maurolepore commented 1 year ago

@ropensci-review-bot check package

ropensci-review-bot commented 1 year ago

Thanks, about to send the query.

ropensci-review-bot commented 1 year ago

:rocket:

Editor check started

:wave:

mpadge commented 1 year ago

We'll check out what went wrong with the bot there, and get check results up asap. Sorry for inconvenience

ropensci-review-bot commented 1 year ago

Checks for xlcutter (v0.0.0.9000)

git hash: 9e26bfa1

Important: All failing checks above must be addressed prior to proceeding

Package License: MIT + file LICENSE


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:---------|------:| |internal |base | 8| |internal |xlcutter | 7| |internal |stats | 1| |internal |utils | 1| |imports |tidyxl | 4| |suggests |knitr | NA| |suggests |rmarkdown | NA| |suggests |testthat | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

c (2), nrow (2), anyDuplicated (1), duplicated (1), lapply (1), unique (1)

xlcutter

escape_markers (3), remove_markers (3), detect_with_markers (1)

tidyxl

xlsx_cells (4)

stats

setNames (1)

utils

type.convert (1)


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 3 files) and - 1 authors - 1 vignette - no internal data file - 1 imported package - 2 exported functions (median 22 lines of code) - 10 non-exported functions in R (median 10 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 3| 21.5| | |files_vignettes | 1| 68.4| | |files_tests | 4| 79.0| | |loc_R | 110| 12.3| | |loc_vignettes | 17| 1.7|TRUE | |loc_tests | 137| 46.6| | |num_vignettes | 1| 64.8| | |n_fns_r | 12| 16.1| | |n_fns_r_exported | 2| 6.8| | |n_fns_r_not_exported | 10| 22.3| | |n_fns_per_file_r | 2| 34.7| | |num_params_per_fn | 6| 79.0| | |loc_per_fn_r | 12| 35.4| | |loc_per_fn_r_exp | 22| 50.8| | |loc_per_fn_r_not_exp | 10| 34.7| | |rel_whitespace_R | 34| 27.9| | |rel_whitespace_vignettes | 29| 3.6|TRUE | |rel_whitespace_tests | 30| 52.0| | |doclines_per_fn_exp | 44| 55.5| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 6| 24.8| | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/Bisaloo/xlcutter/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/Bisaloo/xlcutter/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 4491779797|lint-changed-files |failure |c39e85 | 3|2023-03-22 | | 4491862385|pages build and deployment |success |a21af5 | 5|2023-03-22 | | 4491843560|pkgdown |success |9e26bf | 12|2023-03-22 | | 4491843559|R-CMD-check |success |9e26bf | 11|2023-03-22 | | 4491843561|test-coverage |success |9e26bf | 11|2023-03-22 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 100 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found no issues with this package!


Package Versions

|package |version | |:--------|:--------| |pkgstats |0.1.3.4 | |pkgcheck |0.1.1.20 |


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

Bisaloo commented 1 year ago

@ropensci-review-bot check package

ropensci-review-bot commented 1 year ago

Thanks, about to send the query.

ropensci-review-bot commented 1 year ago

:rocket:

Editor check started

:wave:

mpadge commented 1 year ago

Sorry @Bisaloo, the changes we discussed elsewhere weren't yet deployed. I've re-deployed with those updated changes, so should work now if you call check package again.

Bisaloo commented 1 year ago

@ropensci-review-bot check package

ropensci-review-bot commented 1 year ago

Thanks, about to send the query.

ropensci-review-bot commented 1 year ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 1 year ago

Checks for xlcutter (v0.1.0)

git hash: c6828153

Package License: MIT + file LICENSE


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:---------|------:| |internal |base | 8| |internal |xlcutter | 7| |internal |stats | 1| |internal |utils | 1| |imports |tidyxl | 4| |suggests |knitr | NA| |suggests |rmarkdown | NA| |suggests |testthat | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

c (2), nrow (2), anyDuplicated (1), duplicated (1), lapply (1), unique (1)

xlcutter

escape_markers (3), remove_markers (3), detect_with_markers (1)

tidyxl

xlsx_cells (4)

stats

setNames (1)

utils

type.convert (1)


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 3 files) and - 1 authors - 1 vignette - no internal data file - 1 imported package - 2 exported functions (median 22 lines of code) - 10 non-exported functions in R (median 10 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 3| 21.5| | |files_vignettes | 1| 68.4| | |files_tests | 4| 79.0| | |loc_R | 110| 12.3| | |loc_vignettes | 17| 1.7|TRUE | |loc_tests | 137| 46.6| | |num_vignettes | 1| 64.8| | |n_fns_r | 12| 16.1| | |n_fns_r_exported | 2| 6.8| | |n_fns_r_not_exported | 10| 22.3| | |n_fns_per_file_r | 2| 34.7| | |num_params_per_fn | 6| 79.0| | |loc_per_fn_r | 12| 35.4| | |loc_per_fn_r_exp | 22| 50.8| | |loc_per_fn_r_not_exp | 10| 34.7| | |rel_whitespace_R | 34| 27.9| | |rel_whitespace_vignettes | 29| 3.6|TRUE | |rel_whitespace_tests | 29| 51.4| | |doclines_per_fn_exp | 44| 55.5| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 6| 24.8| | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/Bisaloo/xlcutter/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/Bisaloo/xlcutter/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 4491779797|lint-changed-files |failure |c39e85 | 3|2023-03-22 | | 4530224264|pages build and deployment |success |b154ca | 7|2023-03-27 | | 4530204921|pkgdown |success |c68281 | 14|2023-03-27 | | 4530204923|R-CMD-check |success |c68281 | 13|2023-03-27 | | 4530204928|test-coverage |success |c68281 | 13|2023-03-27 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 100 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found no issues with this package!


Package Versions

|package |version | |:--------|:--------| |pkgstats |0.1.3.4 | |pkgcheck |0.1.1.20 |


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

maurolepore commented 1 year ago

Dear @Bisaloo,

After consulting with the editorial board, we decided that this is out-of-scope. It seem very useful but unfortunately it's too general to take under the current description of our categories. Instead, I encourage you to publish it on CRAN.

Thanks again for sharing your work with rOpenSci, and please think of us again next time you have something for us to consider.

maurolepore commented 1 year ago

@ropensci-review-bot out of scope