OmicsMetaData: an R-package for interoperable and re-usable biodiversity 'omics (meta)data

msweetlove commented 3 years ago

Submitting Author Name: Maxime Sweetlove Submitting Author Github Handle: !--author1-->@msweetlove@jooolia<!--end-editor-- Reviewers: @ginberg, @cpalmer718

Due date for @ginberg: 2021-12-26 Due date for @cpalmer718: 2022-02-19 Archive: TBD Version accepted: TBD

Paste the full DESCRIPTION file inside a code block below:

Package: OmicsMetaData
Title: OmicsMetaData: an R-package for interoperable and re-usable biodiversity 'omics (meta)data.
Version: 0.0.1
Authors@R: 
    person("Maxime",
        "Sweetlove",
        role = c("aut", "cre"),
        email = "msweetlove@naturalsciences.be")
Description: 
        OmicsMetaData: tools to re-use biodiversity 'omics datasets or standardize them to make them globally interoperable
License: GNU General Public License v3 (GLP-3.0) https://www.gnu.org/licenses/gpl-3.0.en.html
Encoding: UTF-8
LazyData: true
Imports:
    worrms (>=0.4.2),
    stringr (>=1.4.0),
    Orcs (>=1.2.1),
    xml2 (>=1.3.2),
    reshape2 (>=1.4.4)
Depends:
    tidyr (>=1.1.3),
    RecordLinkage(>=0.4-12.1),
    mapview (>=2.10.0),
    rgbif (>=3.6.0),
    R (>= 4.0.1)
RoxygenNote: 7.1.1
Suggests: 
    testthat (>=2.3.2),
    roxygen2 (>=7.1.1),
    devtools (>=2.3.0)

Scope

Please indicate which category or categories from our package fit policies this package falls under:
- [x] data retrieval
- [ ] data extraction
- [x] data munging
- [ ] data deposition
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences): data retrieval: The OmicsMetaData package allows users to download nucleotide sequences from INSDC alongside any associated metadata linked to the sequences, based on a BioProject identification number.

data munging: The OmicsMetaData package provides tools to format biodiversity 'omics metadata following the widely used data standards MIxS (for 'Omics data) and DarwinCore (for biodiversity data). Standardizing datasets makes them much more easily interoperable between researchers and institutions and across time, and makes it easier for users to archive the metadata alongside the sequences on the INSDC databases.

Who is the target audience and what are scientific applications of this package? Scientists that work with biodiversity 'omics datasets on a daily basis, and have a need to standardize the data to exchange it between colleagues or archive it online after the end of a project, or if they want to expand their dataset with online openly available sequence data.
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category? At present I am not aware of any such packages.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research? Not applicable.
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted. Not applicable.

Technical checks

Confirm each of the following by checking the box.

[x] I have read the guide for authors and rOpenSci packaging guide.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions, created with roxygen2.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

[ ] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

ropensci-review-bot commented 3 years ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 3 years ago

:rocket:

Editor check started

:wave:

mpadge commented 3 years ago

@msweetlove The editor check failed because your DESCRIPTION file has incorrectly-formatted dependency lists. They must have spaces after the ">=" symbols, so like:

worrms (>= 0.4.2)

and not in current form like

worrms (>=0.4.2)

The RecordLinkage entry also needs a space before the opening bracket. Please ping here once you've updated that and we'll re-run the checks. Thanks!

msweetlove commented 3 years ago

@mpadge the spaces have been added to the DESCRIPTION file in the repo: https://github.com/biodiversity-aq/OmicsMetaData

ropensci-review-bot commented 3 years ago

Checks for [OmicsMetaData (v0.0.1)]()

git hash: 37089da9

:heavy_check_mark: Package name is available
:heavy_multiplication_x: does not have a 'CITATION' file.
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_multiplication_x: does not have a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_multiplication_x: 'DESCRIPTION' does not have a URL field.
:heavy_multiplication_x: 'DESCRIPTION' does not have a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_multiplication_x: These functions do not have examples: [check.valid.metadata.DwC.Rd].
:heavy_multiplication_x: Continuous integration checks unavailable (no URL in 'DESCRIPTION').
:heavy_multiplication_x: Package coverage is 47.7% (should be at least 75%).
:heavy_multiplication_x: R CMD check found 1error.
:heavy_multiplication_x: R CMD check found 9warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: GNU General Public License v3 (GLP-3.0) https://www.gnu.org/licenses/gpl-3.0.en.html

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 10 files) and - 1 authors - 5 vignettes - 10 internal data files - 5 imported packages - 37 exported functions (median 60 lines of code) - 44 non-exported functions in R (median 49 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:-----------------------|-----:|----------:|:----------| |files_R | 10| 55.4| | |files_vignettes | 0| 0.0|TRUE | |files_tests | 8| 85.7| | |loc_R | 2829| 89.3| | |loc_tests | 465| 70.5| | |num_vignettes | 5| 97.5|TRUE | |data_size_total | 50800| 79.9| | |data_size_median | 2304| 69.4| | |n_fns_r | 81| 64.4| | |n_fns_r_exported | 37| 81.3| | |n_fns_r_not_exported | 44| 55.1| | |n_fns_per_file_r | 4| 56.5| | |num_params_per_fn | 2| 10.7| | |loc_per_fn_r | 53| 95.3|TRUE | |loc_per_fn_r_exp | 60| 85.1| | |loc_per_fn_r_not_exp | 50| 95.3|TRUE | |rel_whitespace_R | 11| 80.4| | |rel_whitespace_tests | 20| 87.8| | |doclines_per_fn_exp | 35| 41.6| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 66| 68.6| | ---

1a. Network visualisation

Interactive network visualisation of calls between objects in package can be viewed by clicking here

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

--- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following error: 1. checking examples ... ERROR Running examples in ‘OmicsMetaData-Ex.R’ failed The error most likely occurred in: > ### Name: sync.metadata.sequenceFiles > ### Title: Check if all samples in a dataframe have sequence data > ### Aliases: sync.metadata.sequenceFiles > > ### ** Examples > > \donttrun{ Error: unexpected symbol in "\donttrun" Execution halted R CMD check generated the following warnings: 1. checking whether package ‘OmicsMetaData’ can be installed ... WARNING Found the following significant warnings: Warning: /tmp/RtmpCIJwfC/file39e15a33ceb0/OmicsMetaData.Rcheck/00_pkg_src/OmicsMetaData/man/sync.metadata.sequenceFiles.Rd:30: unknown macro '\donttrun' See ‘/tmp/RtmpCIJwfC/file39e15a33ceb0/OmicsMetaData.Rcheck/00install.out’ for details. 2. checking R files for non-ASCII characters ... WARNING Found the following file with non-ASCII characters: General_Utils.R Portable packages must use only ASCII characters in their R code, except perhaps in comments. Use \uxxxx escapes for other characters. 3. checking dependencies in R code ... WARNING '::' or ':::' import not declared from: ‘RCurl’ Namespaces in Imports field not imported from: ‘Orcs’ ‘xml2’ All declared Imports should be used. Packages in Depends field not imported from: ‘mapview’ ‘RecordLinkage’ ‘rgbif’ ‘tidyr’ These packages need to be imported from (in the NAMESPACE file) for when this namespace is loaded but not attached. package 'methods' is used but not declared 4. checking Rd files ... WARNING prepare_Rd: man/sync.metadata.sequenceFiles.Rd:30: unknown macro '\donttrun' 5. checking for missing documentation entries ... WARNING Undocumented code objects: ‘ENA_allowed_terms’ ‘ENA_checklistAccession’ ‘ENA_geoloc’ ‘ENA_instrument’ ‘ENA_select’ ‘ENA_strat’ ‘TaxIDLib’ ‘TermsLib’ ‘TermsSyn’ ‘TermsSyn_DwC’ Undocumented data sets: ‘ENA_allowed_terms’ ‘ENA_checklistAccession’ ‘ENA_geoloc’ ‘ENA_instrument’ ‘ENA_select’ ‘ENA_strat’ ‘TaxIDLib’ ‘TermsLib’ ‘TermsSyn’ ‘TermsSyn_DwC’ All user-level objects in a package should have documentation entries. See chapter ‘Writing R documentation files’ in the ‘Writing R Extensions’ manual. 6. checking for code/documentation mismatches ... WARNING Functions or methods with usage in documentation object 'dataQC.TaxonListFromData' but not in code: ‘find.sampleTaxon’ Codoc mismatches from documentation object 'wideTable.to.eMoF': wideTable.to.eMoF Code: function(metadata.object, variables = NA) Docs: function(dataset) Argument names in code not in docs: metadata.object variables Argument names in docs not in code: dataset Mismatches in argument names: Position: 1 Code: metadata.object Docs: dataset 7. checking Rd \usage sections ... WARNING Objects in \usage without \alias in documentation object 'dataQC.TaxonListFromData': ‘find.sampleTaxon’ Documented arguments not in \usage in documentation object 'dataQC.completeTaxaNamesFromRegistery': ‘taxBackbone’ Undocumented arguments in documentation object 'prep.metadata.ENA' ‘library.layout’ ‘library.strategy’ ‘library.selection’ Documented arguments not in \usage in documentation object 'prep.metadata.ENA': ‘library_layout’ ‘library_strategy’ ‘library_selection’ Undocumented arguments in documentation object 'show,DwC.event-method' ‘object’ Undocumented arguments in documentation object 'show,DwC.occurrence-method' ‘object’ Undocumented arguments in documentation object 'show,MIxS.metadata-method' ‘object’ Undocumented arguments in documentation object 'wideTable.to.eMoF' ‘dataset’ Documented arguments not in \usage in documentation object 'wideTable.to.eMoF': ‘metadata.object’ ‘variables’ Bad \usage lines found in documentation object 'FileNames.to.Table': FileNames.to.Table (file.dir, paired=TRUE, seq.file.extension=".fastq.gz", pairedEnd.extension=c("_1", "_2") Bad \usage lines found in documentation object 'dataQC.DwC': DataQC.DwC(Event=NA, Occurrence=NA, eMoF=NA, EML.url=NA, out.type="event", ask.input=TRUE)) Bad \usage lines found in documentation object 'sync.metadata.sequenceFiles': sync.metadata.sequenceFiles <- function(Names, file.dir=NULL, paired=TRUE, seq.file.extension=".fastq.gz", pairedEnd.extension=c("_1", "_2")) Functions with \usage entries need to have the appropriate \alias entries, and all their arguments documented. The \usage entries must correspond to syntactically valid R code. See chapter ‘Writing R documentation files’ in the ‘Writing R Extensions’ manual. 8. checking for unstated dependencies in examples ... WARNING Warning: parse error in file 'lines': 2: unexpected symbol 578: 579: \donttrun ^ 9. checking files in ‘vignettes’ ... WARNING Files in the 'vignettes' directory but no files in 'inst/doc': ‘Background.Rmd’, ‘General_Overview.Rmd’, ‘Metadata_standardization.Rmd’, ‘Perpare_data_for_archiving.Rmd’, ‘Retrieving_online_data.Rmd’ Package has no Sweave vignette sources and no VignetteBuilder field. R CMD check generated the following notes: 1. checking DESCRIPTION meta-information ... NOTE Malformed Title field: should not end in a period. Malformed Description field: should contain one or more complete sentences. Non-standard license specification: GNU General Public License v3 (GLP-3.0) https://www.gnu.org/licenses/gpl-3.0.en.html Standardizable: FALSE 2. checking R code for possible problems ... NOTE combine.data: no visible global function definition for ‘new’ commonTax.to.NCBI.TaxID: no visible binding for global variable ‘TaxIDLib’ dataQC.DwC: no visible global function definition for ‘new’ dataQC.DwC_general: no visible binding for global variable ‘TermsLib’ dataQC.findNames: no visible binding for global variable ‘TermsSyn’ dataQC.MIxS: no visible binding for global variable ‘TermsSyn’ dataQC.MIxS: no visible binding for global variable ‘TermsLib’ dataQC.MIxS: no visible binding for global variable ‘ENA_checklistAccession’ dataQC.MIxS: no visible global function definition for ‘new’ dataQC.TermsCheck: no visible binding for global variable ‘TermsLib’ download.sequences.INSDC: no visible global function definition for ‘download.file’ download.sequences.INSDC: no visible global function definition for ‘read.table’ eMoF.to.wideTable: no visible binding for global variable ‘eventID’ eMoF.to.wideTable: no visible binding for global variable ‘measurementValue’ eMoF.to.wideTable: no visible binding for global variable ‘occurrenceID’ FileNames.to.Table: no visible binding for global variable ‘rv_out’ get.BioProject.metadata.INSDC: no visible global function definition for ‘download.file’ get.BioProject.metadata.INSDC: no visible global function definition for ‘read.csv’ get.ENAName: no visible binding for global variable ‘TermsLib’ get.sample.attributes.INSDC: no visible global function definition for ‘read_xml’ get.sample.attributes.INSDC: no visible global function definition for ‘xml_find_all’ get.sample.attributes.INSDC: no visible global function definition for ‘xml_attr’ get.sample.attributes.INSDC: no visible global function definition for ‘xml_text’ get.sample.attributes.INSDC: no visible global function definition for ‘as_list’ prep.metadata.ENA: no visible binding for global variable ‘ENA_checklistAccession’ prep.metadata.ENA: no visible binding for global variable ‘ENA_geoloc’ prep.metadata.ENA: no visible global function definition for ‘separate’ prep.metadata.ENA: no visible binding for global variable ‘ENA_instrument’ prep.metadata.ENA: no visible binding for global variable ‘ENA_select’ prep.metadata.ENA: no visible binding for global variable ‘ENA_strat’ prep.metadata.ENA: no visible global function definition for ‘write.table’ term.definition: no visible binding for global variable ‘TermsLib’ term.definition: no visible binding for global variable ‘TermsSyn’ wideTable.to.eMoF: no visible global function definition for ‘gather’ wideTable.to.eMoF: no visible binding for global variable ‘measurementType’ wideTable.to.eMoF: no visible binding for global variable ‘measurementValue’ write.MIxS: no visible global function definition for ‘write.csv’ show,DwC.event: no visible binding for global variable ‘EML.url’ show,DwC.occurrence: no visible binding for global variable ‘EML.url’ Undefined global functions or variables: as_list download.file EML.url ENA_checklistAccession ENA_geoloc ENA_instrument ENA_select ENA_strat eventID gather measurementType measurementValue new occurrenceID read_xml read.csv read.table rv_out separate TaxIDLib TermsLib TermsSyn write.csv write.table xml_attr xml_find_all xml_text Consider adding importFrom("methods", "new") importFrom("utils", "download.file", "read.csv", "read.table", "write.csv", "write.table") to your NAMESPACE file (and ensure that your DESCRIPTION Imports field contains 'methods'). R CMD check generated the following check_fails: 1. cyclocomp 2. no_description_depends 3. description_url 4. description_bugreports 5. rcmdcheck_malformed_title_or_description 6. rcmdcheck_r_files_are_ascii 7. rcmdcheck_undeclared_imports 8. rcmdcheck_undefined_globals 9. rcmdcheck_missing_docs 10. rcmdcheck_code_docs_mismatch 11. rcmdcheck_unstated_dependencies_in_examples 12. rcmdcheck_examples_run 13. rcmdcheck_examples_run_without_warnings 14. rcmdcheck_significant_compilation_warnings #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 47.65 The following files are not completely covered by tests: file | coverage --- | --- R/Classes_Libraries.R | 20.25% R/DataFormat_Utils.R | 59.73% R/DataQC_Main_DwCgeneral.R | 32.12% R/DataQC_Main_MIxS.R | 50% R/DataQC_Utils.R | 55.99% R/Format_sequenceData_ENA.R | 43.8% R/Get_SequenceData_INSDC.R | 29.29% R/OmicsMetaData.R | 0% #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- prep.metadata.ENA | 144 dataQC.MIxS | 117 dataQC.DwC_general | 73 dataQC.TermsCheck | 37 dataQC.dateCheck | 35 dataQC.eventStructure | 35 dataQC.guess.env_package.from.data | 35 combine.data.frame | 30 combine.data | 26 dataQC.generate.footprintWKT | 26 dataQC.LatitudeLongitudeCheck | 25 dataQC.findNames | 23 download.sequences.INSDC | 22 dataQC.DwC | 20 sync.metadata.sequenceFiles | 16 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 875 potential issues: message | number of times --- | --- Avoid 1:length(...) expressions, use seq_len. | 16 Avoid 1:ncol(...) expressions, use seq_len. | 9 Avoid 1:nrow(...) expressions, use seq_len. | 15 Avoid using sapply, consider vapply instead, that's type safe | 32 Lines should not be more than 80 characters. | 795 Use <-, not =, for assignment. | 8

Package Versions

|package |version | |:--------|:--------| |pkgstats |0.0.2.16 | |pkgcheck |0.0.2.83 |

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

noamross commented 3 years ago

Thank you for your submission, @msweetlove! It looks like there are still a number of things needed to get this package ready for review. Please look at the report above. The first few are simple metadata components. However, we do need the package to have CI checks, >75% code coverage unless there are specific reasons, and a clean R CMD check.

Let us know when you've made these updates and we can proceed, and do ask any questions you have!

I note that with your spatial dependencies CI setup can be a little finicky, @mpadge can point you to resources if you need them.

msweetlove commented 3 years ago

Hi @noamross and @mpadge, I went through the list issues, and they should be fixed and updated now. Code coverage is now 78.91%. I'm not completely sure for the CI issue though (my knowledge in that area is rather limited). I added GitHub actions to the package, and I was wondering if this is enough? If not, I could use some help here. Cheers Maxime

noamross commented 3 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 3 years ago

Thanks, about to send the query.

ropensci-review-bot commented 3 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 3 years ago

Checks for OmicsMetaData (v0.0.1)

git hash: 853aeabe

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'CITATION' file.
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_multiplication_x: These functions do not have examples: [commonTax.to.NCBI.TaxID.Rd].
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 78.9%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>= 3)

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 11 files) and - 1 authors - 5 vignettes - 10 internal data files - 10 imported packages - 37 exported functions (median 61 lines of code) - 46 non-exported functions in R (median 48 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:-----------------------|-----:|----------:|:----------| |files_R | 11| 59.3| | |files_vignettes | 0| 0.0|TRUE | |files_tests | 10| 88.6| | |loc_R | 2877| 89.5| | |loc_tests | 1281| 87.7| | |num_vignettes | 5| 97.5|TRUE | |data_size_total | 50800| 79.9| | |data_size_median | 2304| 69.4| | |n_fns_r | 83| 65.1| | |n_fns_r_exported | 37| 81.3| | |n_fns_r_not_exported | 46| 56.5| | |n_fns_per_file_r | 4| 53.4| | |num_params_per_fn | 2| 10.7| | |loc_per_fn_r | 50| 94.7| | |loc_per_fn_r_exp | 61| 85.4| | |loc_per_fn_r_not_exp | 48| 94.9| | |rel_whitespace_R | 12| 81.7| | |rel_whitespace_tests | 16| 93.7| | |doclines_per_fn_exp | 37| 44.9| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 66| 68.6| | ---

1a. Network visualisation

Interactive network visualisation of calls between objects in package can be viewed by clicking here

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![github](https://github.com/biodiversity-aq/OmicsMetaData/workflows/R-CMD-check/badge.svg)](https://github.com/biodiversity-aq/OmicsMetaData/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:-----------|:----------|:------|:----------| |R-CMD-check |success |853aea |2021-10-25 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following note: 1. checking dependencies in R code ... NOTE Namespace in Imports field not imported from: ‘Orcs’ All declared Imports should be used. R CMD check generated the following check_fails: 1. cyclocomp 2. no_description_date 3. rcmdcheck_imports_not_imported_from #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 78.91 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- prep.metadata.ENA | 145 dataQC.MIxS | 117 dataQC.DwC_general | 75 dataQC.TermsCheck | 37 dataQC.dateCheck | 35 dataQC.eventStructure | 35 dataQC.guess.env_package.from.data | 35 combine.data.frame | 30 combine.data | 26 dataQC.generate.footprintWKT | 26 dataQC.LatitudeLongitudeCheck | 25 dataQC.findNames | 23 download.sequences.INSDC | 23 dataQC.DwC | 20 sync.metadata.sequenceFiles | 16 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 1055 potential issues: message | number of times --- | --- Avoid 1:length(...) expressions, use seq_len. | 16 Avoid 1:ncol(...) expressions, use seq_len. | 9 Avoid 1:nrow(...) expressions, use seq_len. | 17 Avoid using sapply, consider vapply instead, that's type safe | 32 Lines should not be more than 80 characters. | 973 Use <-, not =, for assignment. | 8

Package Versions

|package |version | |:--------|:--------| |pkgstats |0.0.2.16 | |pkgcheck |0.0.2.86 |

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

msweetlove commented 3 years ago

@noamross and @mpadge: added an example for the function commonTax.to.NCBI.TaxID

noamross commented 3 years ago

@ropensci-review-bot assign @jooolia as editor

ropensci-review-bot commented 3 years ago

Assigned! @jooolia is now the editor

jooolia commented 3 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 3 years ago

Thanks, about to send the query.

ropensci-review-bot commented 3 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 3 years ago

Checks for OmicsMetaData (v0.0.1)

git hash: 0d8728ac

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'CITATION' file.
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_multiplication_x: These functions do not have examples: [commonTax.to.NCBI.TaxID.Rd].
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 78.9%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>= 3)

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 11 files) and - 1 authors - 5 vignettes - 10 internal data files - 10 imported packages - 37 exported functions (median 61 lines of code) - 46 non-exported functions in R (median 48 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:-----------------------|-----:|----------:|:----------| |files_R | 11| 59.3| | |files_vignettes | 0| 0.0|TRUE | |files_tests | 10| 88.6| | |loc_R | 2877| 89.5| | |loc_tests | 1281| 87.7| | |num_vignettes | 5| 97.5|TRUE | |data_size_total | 50800| 79.9| | |data_size_median | 2304| 69.4| | |n_fns_r | 83| 65.1| | |n_fns_r_exported | 37| 81.3| | |n_fns_r_not_exported | 46| 56.5| | |n_fns_per_file_r | 4| 53.4| | |num_params_per_fn | 2| 10.7| | |loc_per_fn_r | 50| 94.7| | |loc_per_fn_r_exp | 61| 85.4| | |loc_per_fn_r_not_exp | 48| 94.9| | |rel_whitespace_R | 12| 81.7| | |rel_whitespace_tests | 16| 93.7| | |doclines_per_fn_exp | 37| 44.9| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 66| 68.6| | ---

1a. Network visualisation

Interactive network visualisation of calls between objects in package can be viewed by clicking here

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![github](https://github.com/biodiversity-aq/OmicsMetaData/workflows/R-CMD-check/badge.svg)](https://github.com/biodiversity-aq/OmicsMetaData/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:-----------|:----------|:------|:----------| |R-CMD-check |success |0d8728 |2021-10-25 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following note: 1. checking dependencies in R code ... NOTE Namespace in Imports field not imported from: ‘Orcs’ All declared Imports should be used. R CMD check generated the following check_fails: 1. cyclocomp 2. no_description_date 3. rcmdcheck_imports_not_imported_from #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 78.91 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- prep.metadata.ENA | 145 dataQC.MIxS | 117 dataQC.DwC_general | 75 dataQC.TermsCheck | 37 dataQC.dateCheck | 35 dataQC.eventStructure | 35 dataQC.guess.env_package.from.data | 35 combine.data.frame | 30 combine.data | 26 dataQC.generate.footprintWKT | 26 dataQC.LatitudeLongitudeCheck | 25 dataQC.findNames | 23 download.sequences.INSDC | 23 dataQC.DwC | 20 sync.metadata.sequenceFiles | 16 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 1055 potential issues: message | number of times --- | --- Avoid 1:length(...) expressions, use seq_len. | 16 Avoid 1:ncol(...) expressions, use seq_len. | 9 Avoid 1:nrow(...) expressions, use seq_len. | 17 Avoid using sapply, consider vapply instead, that's type safe | 32 Lines should not be more than 80 characters. | 973 Use <-, not =, for assignment. | 8

Package Versions

|package |version | |:--------|:--------| |pkgstats |0.0.2.16 | |pkgcheck |0.0.2.86 |

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

jooolia commented 3 years ago

Hi @msweetlove, looks good to me. I see that you have added the example to commonTax.to.NCBI.TaxID.Rd but not updated the documentation. I will proceed with looking for reviewers but it would be great if you could update your docs. I think there are many of the {goodpractice} lintr comments that could be incorporated (e.g. regarding assignment, seq_len, vapply and long lines).

Thanks, Julia

msweetlove commented 3 years ago

Thanks, I was already wondering where that commonTax.to.NCBI.TaxID example had gone to... The documentation has been updated now! Cheers, Maxime

jooolia commented 3 years ago

@ropensci-review-bot add @orchid00 to reviewers

ropensci-review-bot commented 3 years ago

@orchid00 added to the reviewers list. Review due date is 2021-12-01. Thanks @orchid00 for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 3 years ago

@orchid00: If you haven't done so, please fill this form for us to update our reviewers records.

jooolia commented 3 years ago

Hello @orchid00 ! thanks for agreeing to review. I am still looking for a second reviewer and we did agree that your review would have a later due date of December 15th (however if it is done earlier that is great).

jooolia commented 2 years ago

Dear @msweetlove , I am still looking for a second reviewer. Thanks, Julai

jooolia commented 2 years ago

@ropensci-review-bot add @ginberg to reviewers

ropensci-review-bot commented 2 years ago

@ginberg added to the reviewers list. Review due date is 2021-12-26. Thanks @ginberg for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@ginberg: If you haven't done so, please fill this form for us to update our reviewers records.

jooolia commented 2 years ago

Dear @ginberg thanks for agreeing to review!

ginberg commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 4

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

README

contains a typo: 'lisence' instead of license
styling: some (sub)titles are not showing up as subtitles since there is no space between the hashtags and the title

Check:

devtools is listed twice in Suggests section in DESCRIPTION

Vignettes:

styling: some (sub)titles are not showing up as subtitles since there is no space between the hashtags and the title

Automated tests:

The overall test coverage is good (78.91%). There are 20 warnings when running all tests, could you fix these?

Packaging guidelines:

rOpenSci recommends a package name in lowercase in their guidelines, could you change that?

jooolia commented 2 years ago

Thanks very much @ginberg for the review. (sorry for my slow response)

jooolia commented 2 years ago

Hi @orchid00, do you think it will be possible to submit your review soon? Thanks, Julia

orchid00 commented 2 years ago

I'm sorry, I was not able to do the review, I prefer to step out. I was on leave first, then I got ill. Now back to too many things to accomodate for.

jooolia commented 2 years ago

Hi @orchid00, Thanks for letting me know. Hope you are feeling better and wishing you the best.

I will look for another reviewer @msweetlove. Thanks!

jooolia commented 2 years ago

@ropensci-review-bot add @cpalmer718 to reviewers

jooolia commented 2 years ago

Dear @cpalmer718, thank you for agreeing to review! The due date for your review is 2022-02-19. Please refer to our reviewer guide. If you have any questions feel free to ask here or via email. Thanks, Julia

lightning-auriga commented 2 years ago

Hi @msweetlove @jooolia please see below for my review. I've added commentary at the bottom to explain some of it. I'm trying to return this early because I wasn't able to clearly evaluate parts of the package on my system, so we may need to iterate a bit.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[ ] Installation instructions: for the development version of package and any non-standard dependencies in README
[ ] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[ ] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[ ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 10

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Installation

Windows
- Based on the ropensci specs, I'm assuming this is intended to work on Windows, but I guess I don't know for certain. In any case, my primary test system happened to be Windows 10, but the package example under "Brief demonstration usage" does not work on Windows due to hard-coded use of wget. This also seems to break the testthat tests. It seems like this package was developed on OSX; was this tested on Windows or Linux? I recommend replacing "wget" with "auto" to allow R to figure it out.
- Assigning encoding as "UTF-8" without conversion fails on Windows 10 (degree symbol is declared invalid UTF-8, tests fail). this is obviously very likely platform dependent. passing character vector through appropriate sanitizer enc2utf8 fixes the problem.
- A vignette example in "Retrieving online data" fails on Windows with the error Error in download.file(ftp_url, destfile = file.path(destination.path, : scheme not supported in URL 'ftp.sra.ebi.ac.uk/vol1/fastq/SRR298/004/SRR2980684/SRR2980684_1.fastq.gz'. Apparently the "auto" setting is causing Windows to choose a download method that doesn't support ftp. This is seemingly resolved by setting the method parameter in download.file to "libcurl" as it is available on my system. I'd recommend going through the download.file documentation and coming up with a logic chain that hopefully guarantees the download will succeed for any system (which may involve requiring non-R system dependencies), or otherwise exposing the method parameter to the user so they can override the system's choice without editing the library's source code.
OSX/Linux
- After having problems with Windows, I tried installing on OSX and Linux, and had different issues. I tried installing in clean conda environments and ran up against a number of system dependencies that failed. Can you please try out clean installations on those systems and document system dependencies that may be required (e.g. libudunits2, gdal, etc.)?
  Documentation
README.md has various issues with markdown syntax that are interfering with legibility: code blocks formatted as R, spaces after '#' characters in headers, etc.
The "Settings" page for NCBI doesn't have a clear link once you click to their site; it might be helpful to provide clearer instructions about how to get to the right page.
The way the API key is ingested in the package, it seems like it's encouraging users to paste it into scripts, which represents a kind of security vulnerability. It would be nice to instruct users how they might better secure their API key for use in R (for example placing it in .Renviron or using a more formal handling mechanism.
Vignettes
Similar to README.md, working on improving the markdown syntax and presentation could really help the legibility of the vignettes. They're hard to get through at the moment.
Please add author: "{name}" tags to YAML headers of vignette .Rmd files
Be sure to change links to other vignettes from within the vignettes to point to .html files instead of .Rmd source files. It would also be helpful to link a table of contents at the top of each vignette, or something like that, given the number of vignettes.
workingDir is defined in Metadata_standardization but then not used, and in the final statement write.MIxS function emits relative to working directory
In Retrieving online data, the example in block download_sequences emits the warning the metadata will be retruned to the Console If you did assign the output of this function to an R-object (using "<-"): better abort and restart now; since the result is assigned to a variable in the vignette example, this is really confusing as a new user.
It seems that no vignette documents anything about DarwinCore content in the library, unless I've missed it somehow.
I think, as a general comment, it's hard to get from the vignettes what I'm really looking for as a new user, which is an end-to-end example of what the intended use case is for the package. I think this is in part due to how the vignettes are fragmented, and also that for most of the vignettes the code is not executable as-is. Though it's hard, this experience might be substantially improved by consolidating the vignettes into end-to-end walkthroughs of how you envision the user directly interacting with a dataset, with every step of data management and adjustment provided with a runnable example, and maybe some tables showing the expected data frame contents or similar. This would also possibly provide an opportunity to include DarwinCore content in the vignettes.
General repo comments:
The license status of the project is unclear. Top level README.md says GPL3 (please be sure to correct typo). However, source files and codedata.json all say CC-0. Please either:
- harmonize this license selection across all files, and for good measure add license declarations in the header of all source files; or
- explain in top-level README.md the license status of the project so it's clear
R history and data hidden files, and Mac tracker files (.DS_Store), should be removed from the repo. The appropriate file patterns should be added to .gitignore at top level to prevent their addition to the repo.
The convention for default branch naming has changed (see here). Please consider renaming the default branch from master to main or default as preferred.
R/sysdata.rda has something called "MarsLib" in it; is that expected to be there? It seems like several data structures are being loaded directly from saved data files, which are provided as jagged data frames in the package namespace. It works, but it's somewhat unwieldy to work with. It would be nice to have some of the prestock library content available within the library as YAML files or something. Regardless, it really seems like the jagged data frames should be rather lists of vectors, so that when the user accesses one of the shorter vectors, they don't accidentally end up using the many empty string entries at the end of the vector.
Style/packaging:
The package would really benefit from passes through styler and lintr. In particular:
- the cyclomatic complexity is really high for some functions, which makes it very challenging to actually test them.
- variable and function naming is a bit messy. the combination of camel and dotted snake and all-caps and other styles in the same names at times makes it very difficult to read. the ropensci docs recommend snake_case.
Some of the automated checks recommend avoiding sapply and some other similar things, and I think that would be good. In particular, there are some statements (e.g. unlist(unname(sapply(paste)))) that I think could be made much more straightforward by using, for example, paste's vectorized behavior. This doesn't seem to cause any bugs that I've noticed, but it makes it difficult to read and maintain, and depending on the size of your input datasets, some of the for loops might cause some issues. I don't think it really will cause performance degradation at realistic dataset sizes, but it would just make things a lot cleaner and consistent with packaging best practices.
Tests:
The tests didn't catch some of the issues I had with the package on Windows, so they really could use expansion.
Some of the tests technically cover the functions but don't then, for example, check the format of the file emitted by the tested function. I strongly encourage increasing test coverage and thoroughness if you can.

Thank you for inviting me to review! In particular with the Windows system issues, I'd really enjoy taking another look once things are working on the platform.

jooolia commented 2 years ago

Dear @msweetlove, have you had time to look at the reviews? Do you know when you will have time to respond? Thanks, Julia

jooolia commented 2 years ago

Email sent to author at naturalsciences.be address.

jooolia commented 2 years ago

Email bounced. Tried an address @ vlaanderen.be

jooolia commented 2 years ago

Dear @ginberg and @cpalmer718, Thank you very much for your reviews. We appreciate the time and care you put into them. The author has switched jobs and cannot continue with the development of this package and unfortunately there is no-one at his former workplace who can carry on with the work so the package development will be stopped. This means that we will close this review now. Please let me know how many hours you each spent reviewing (you can estimate if you can't exactly remember as it was quite some time ago) and we can log this in our reviewer database.

Thanks again for your help and input, Julia

ginberg commented 2 years ago

@jooolia okay, that's too bad. I spend around 4h and @cpalmer718 seems to have spend 10h (see the review)

jooolia commented 2 years ago

Thanks @ginberg! Yes somehow I missed both of your hours. :dizzy_face: thanks for pointing out that the info was already there!

lightning-auriga commented 2 years ago

@jooolia I'm sorry to hear it, but thanks for inviting me regardless. Yes, that was about the number of hours I spent, thanks @ginberg

ropensci / software-review