ropensci / software-review

rOpenSci Software Peer Review.
286 stars 104 forks source link

Submission: rio #605

Closed chainsawriot closed 10 months ago

chainsawriot commented 10 months ago

Submitting Author Name: Chung-hong Chan Submitting Author Github Handle: !--author1-->@chainsawriot<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) !--author-others-->@leeper<!--end-author-others-- Repository: https://github.com/chainsawriot/rio Version submitted: 0.5.30 Submission type: Standard Editor: TBD Reviewers: TBD

Archive: TBD Version accepted: TBD Language: en


Package: rio
Type: Package
Title: A Swiss-Army Knife for Data I/O
Version: 0.5.30
Authors@R: c(person("Jason", "Becker", role = "ctb", email = "jason@jbecker.co"),
             person("Chung-hong", "Chan", role = c("aut", "cre"), email = "chainsawtiney@gmail.com",
                 comment = c(ORCID = "0000-0002-6232-7530")),
             person("Geoffrey CH", "Chan", role = "ctb", email = "gefchchan@gmail.com"),
             person("Thomas J.", "Leeper",
                    role = "aut", 
                    email = "thosjleeper@gmail.com",
                    comment = c(ORCID = "0000-0003-4097-6326")),
             person("Christopher", "Gandrud", role = "ctb"),
             person("Andrew", "MacDonald", role = "ctb"),
             person("Ista", "Zahn", role = "ctb"),
             person("Stanislaus", "Stadlmann", role = "ctb"),
             person("Ruaridh", "Williamson", role = "ctb", email = "ruaridh.williamson@gmail.com"),
             person("Patrick", "Kennedy", role = "ctb"),
             person("Ryan", "Price", email = "ryapric@gmail.com", role = "ctb"),
             person("Trevor L", "Davis", email = "trevor.l.davis@gmail.com", role = "ctb"),
             person("Nathan", "Day", email = "nathancday@gmail.com", role = "ctb"),
             person("Bill", "Denney",
                    email="wdenney@humanpredictions.com",
                    role="ctb",
                    comment=c(ORCID="0000-0002-5759-428X")),
             person("Alex", "Bokov", email = "alex.bokov@gmail.com", role = "ctb",
                    comment=c(ORCID="0000-0002-0511-9815"))
             )
Description: Streamlined data import and export by making assumptions that
    the user is probably willing to make: 'import()' and 'export()' determine
    the data structure from the file extension, reasonable defaults are used for
    data import and export (e.g., 'stringsAsFactors=FALSE'), web-based import is
    natively supported (including from SSL/HTTPS), compressed files can be read
    directly without explicit decompression, and fast import packages are used where
    appropriate. An additional convenience function, 'convert()', provides a simple
    method for converting between file types.
URL: https://github.com/chainsawriot/rio
BugReports: https://github.com/chainsawriot/rio/issues
Depends:
    R (>= 3.6)
Imports:
    tools,
    stats,
    utils,
    foreign,
    haven (>= 1.1.2),
    curl (>= 0.6),
    data.table (>= 1.9.8),
    readxl (>= 0.1.1),
    openxlsx,
    tibble
Suggests:
    datasets,
    bit64,
    testthat,
    knitr,
    magrittr,
    arrow,
    clipr,
    feather,
    fst,
    hexView,
    jsonlite,
    pzfx,
    readODS (>= 1.6.4),
    readr,
    rmarkdown,
    rmatio,
    xml2 (>= 1.2.0),
    yaml
License: GPL-2
VignetteBuilder: knitr
Encoding: UTF-8
RoxygenNote: 7.2.3

Scope

This package is for loading and saving data from either files or urls.

Probably all scientific disciplines that involve dealing with data files.

As far as I know there are four: reader (not readr), io, ImportExport, and SchemaOnRead. The current package is probably the most used.

Yes

No. I am sorry.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

ropensci-review-bot commented 10 months ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 10 months ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 10 months ago

Checks for rio (v0.5.30)

git hash: daf6cd15

Important: All failing checks above must be addressed prior to proceeding

(Checks marked with :eyes: may be optionally addressed.)

Package License: GPL-2


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:----------|------:| |internal |base | 494| |internal |rio | 33| |internal |grDevices | 4| |internal |graphics | 3| |internal |methods | 1| |imports |utils | 23| |imports |haven | 11| |imports |tools | 8| |imports |openxlsx | 5| |imports |stats | 3| |imports |foreign | 3| |imports |data.table | 3| |imports |curl | 2| |imports |readxl | 1| |imports |tibble | 1| |suggests |xml2 | 17| |suggests |clipr | 3| |suggests |pzfx | 3| |suggests |rmatio | 3| |suggests |feather | 2| |suggests |fst | 2| |suggests |readODS | 2| |suggests |readr | 2| |suggests |arrow | 1| |suggests |jsonlite | 1| |suggests |datasets | NA| |suggests |bit64 | NA| |suggests |testthat | NA| |suggests |knitr | NA| |suggests |magrittr | NA| |suggests |hexView | NA| |suggests |rmarkdown | NA| |suggests |yaml | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

file (148), which (46), list (30), c (29), lapply (16), names (14), for (11), attributes (10), do.call (9), paste0 (9), seq_along (9), args (7), row.names (7), invisible (6), length (6), unlist (6), drop (5), sapply (5), try (5), url (5), as.character (4), format (4), nchar (4), tempfile (4), basename (3), col (3), new.env (3), raw (3), regexpr (3), regmatches (3), rep (3), return (3), seq_len (3), strsplit (3), unclass (3), class (2), cumsum (2), formals (2), gettext (2), grep (2), gsub (2), levels (2), max (2), nrow (2), readLines (2), setdiff (2), sort (2), switch (2), table (2), tolower (2), unique (2), alist (1), as.environment (1), as.numeric (1), attr (1), by (1), cbind.data.frame (1), comment (1), dump (1), duplicated (1), getOption (1), getwd (1), if (1), is.na (1), labels (1), library (1), match.arg (1), match.call (1), ncol (1), paste (1), quote (1), read.dcf (1), readBin (1), rownames (1), sink (1), sprintf (1), structure (1), sub (1), substitute (1), system.file (1), T (1)

rio

import (4), twrap (3), arg_reconcile (2), doone (2), export (2), extract_html_row (2), find_compress (2), get_ext (2), uninstalled_formats (2), characterize (1), characterize.data.frame (1), characterize.default (1), compress_out (1), convert (1), convert_google_url (1), export_delim (1), factorize (1), factorize.data.frame (1), factorize.default (1), gather_attrs (1), standardize_attributes (1)

utils

data (5), unzip (5), untar (4), type.convert (2), zip (2), head (1), packageName (1), read.fortran (1), tar (1), write.table (1)

xml2

read_xml (4), xml_add_child (4), read_html (3), as_list (2), xml_find_all (2), xml_attrs (1), xml_children (1)

haven

write_sav (4), write_dta (2), write_sas (2), write_xpt (2), read_sas (1)

tools

file_ext (5), file_path_sans_ext (3)

openxlsx

addWorksheet (1), getSheetNames (1), loadWorkbook (1), saveWorkbook (1), write.xlsx (1)

grDevices

bmp (1), jpeg (1), png (1), tiff (1)

clipr

read_clip (1), read_clip_tbl (1), write_clip (1)

data.table

rbindlist (2), fwrite (1)

foreign

read.dta (1), read.systat (1), write.dbf (1)

graphics

title (2), text (1)

pzfx

read_pzfx (2), write_pzfx (1)

rmatio

write.mat (2), read.mat (1)

stats

setNames (3)

curl

curl_fetch_memory (1), parse_headers (1)

feather

read_feather (1), write_feather (1)

fst

read.fst (1), write.fst (1)

readODS

read_ods (1), write_ods (1)

readr

fwf_empty (1), read_fwf (1)

arrow

write_parquet (1)

jsonlite

fromJSON (1)

methods

is (1)

readxl

excel_sheets (1)

tibble

as_tibble (1)


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 22 files) and - 2 authors - 1 vignette - no internal data file - 10 imported packages - 21 exported functions (median 15 lines of code) - 202 non-exported functions in R (median 4 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 22| 83.6| | |files_vignettes | 1| 68.4| | |files_tests | 54| 99.3| | |loc_R | 1513| 78.2| | |loc_vignettes | 182| 46.0| | |loc_tests | 1196| 89.2| | |num_vignettes | 1| 64.8| | |n_fns_r | 223| 91.2| | |n_fns_r_exported | 21| 68.8| | |n_fns_r_not_exported | 202| 93.3| | |n_fns_per_file_r | 5| 71.4| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 5| 8.1| | |loc_per_fn_r_exp | 15| 35.6| | |loc_per_fn_r_not_exp | 4| 9.3| | |rel_whitespace_R | 8| 57.4| | |rel_whitespace_vignettes | 38| 50.9| | |rel_whitespace_tests | 19| 86.4| | |doclines_per_fn_exp | 50| 63.0| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 88| 77.1| | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges (There do not appear to be any) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 6010120891|pages build and deployment |failure |57d3a2 | 5|2023-08-29 | | 6012061439|R-CMD-check |success |fd7053 | 6|2023-08-29 | | 6012061436|test-coverage |success |fd7053 | 18|2023-08-29 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 87.35 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- import_list | 31 import | 24 arg_reconcile | 20 import_delim | 18 set_class | 17 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 314 potential issues: message | number of times --- | --- Avoid 1:nrow(...) expressions, use seq_len. | 2 Avoid changing the working directory, or restore it in on.exit | 3 Avoid library() and require() calls in packages | 4 Avoid using sapply, consider vapply instead, that's type safe | 8 Lines should not be more than 80 characters. | 297


4. Other Checks

Details of other checks (click to open)

:heavy_multiplication_x: The following 4 function names are duplicated in other packages: - - `convert` from AquaEnv, ascii, breakaway, cabootcrs, CHNOSZ, convertr, coreCT, DDIwR, equateIRT, hablar, khroma, nCov2019, phenopix, qtl, quanteda, rMIDAS, scan, StratigrapheR, tidygraph, tis, wavethresh - - `export` from admisc, aLFQ, box, box, bruceR, campsismod, crestr, EviewsR, flux, fsbrain, gm, grainscape, inTextSummaryTable, job, kimisc, Momocs, Morpho, mpm, pitchRx, scan, seewave, soc.ca, strvalidator, tipsae, wpa - - `factorize` from admisc, conf.design, elliptic, Epi, gmp, labdsv, lme4, mosaic, QCApro, RcmdrPlugin.KMggplot2, rminer, sfsmisc - - `import` from act, aLFQ, ambiorix, backports, bruceR, EviewsR, fSRM, importar, isqg, MALDIquantForeign, NMproject, openair, reticulate, reticulate, rTorch, strvalidator, tensorflow


Package Versions

|package |version | |:--------|:-------| |pkgstats |0.1.3.7 | |pkgcheck |0.1.2.1 |


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

chainsawriot commented 10 months ago

@ropensci-review-bot help

ropensci-review-bot commented 10 months ago

Hello @chainsawriot, here are the things you can ask me to do:


# Add an author's response info to the ROpenSci logs
@ropensci-review-bot submit response <AUTHOR_RESPONSE_URL>

# List all available commands
@ropensci-review-bot help

# Show our Code of Conduct
@ropensci-review-bot code of conduct

# Invite the author of a package to the corresponding rOpenSci team. This command should be issued by the author of the package.
@ropensci-review-bot invite me to ropensci/package-name

# Adds package's repo to the rOpenSci team. This command should be issued after approval and transfer of the package.
@ropensci-review-bot finalize transfer of package-name

# Various package checks
@ropensci-review-bot check package

# Checks srr documentation for stats packages
@ropensci-review-bot check srr
chainsawriot commented 10 months ago

@ropensci-review-bot check package

ropensci-review-bot commented 10 months ago

Thanks, about to send the query.

ropensci-review-bot commented 10 months ago

:rocket:

Editor check started

:wave:

chainsawriot commented 10 months ago

@ropensci My local check with pkgcheck showed that there should be no more items marked with x, except the optional point on duplicated function names. However, as a decade old package it is probably harmful in terms of computational reproducibility to change those generic function names now: import, export, convert and factorize.

chainsawriot commented 10 months ago
httr::HEAD("https://badges.ropensci.org/605_status.svg")
#> Response [https://badges.ropensci.org/605_status.svg]
#>   Date: 2023-08-30 15:32
#>   Status: 404
#>   Content-Type: text/html; charset=utf-8
#> <EMPTY BODY>

Created on 2023-08-30 with reprex v2.0.2

noamross commented 10 months ago

Thank you for this submission @chainsawriot! I realize the last response from the bot is an error, as a badge should not be generated or checked for until after an editor has approved moving forward with the process.

I believe rio is out of scope for us. Per the package descriptions in our Aims and Scope, retrieval, extraction, or munging categories should be specific to "data sources / topics", "aid in retrieving data from unstructured sources such as text, images and PDFs, as well as parsing scientific data types and outputs from scientific equipment", or "focus on tools for handling data in specific scientific formats generated from scientific workflows or exported from scientific instruments." The reason for this is that it is hard to have objective reviews for where we draw on relevant field expertise with highly general/swiss army tools. The latter are more likely to have a lot of users that provide feedback so need the review process less. I would recommend JOSS as a venue for reviewing and publishing rio.