[PRE REVIEW]: popsom: A Very Efficient Implementation of Self-Organizing Maps with Starburst Visualizations for R

whedon commented 3 years ago

Submitting author: @lutzhamel (Lutz Hamel) Repository: https://github.com/lutzhamel/popsom Version: v5.2 Editor: @fabian-s Reviewers: @rcannood, @HerrMo Managing EiC: Kristen Thyng

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/73da2a16978d49e69b6e71f74df66a00"><img src="https://joss.theoj.org/papers/73da2a16978d49e69b6e71f74df66a00/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/73da2a16978d49e69b6e71f74df66a00/status.svg)](https://joss.theoj.org/papers/73da2a16978d49e69b6e71f74df66a00)

Author instructions

Thanks for submitting your paper to JOSS @lutzhamel. Currently, there isn't an JOSS editor assigned to your paper.

@lutzhamel if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:

@whedon commands

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 3 years ago

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.04 s (226.6 files/s, 67325.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
R                                1            211            355           1262
Markdown                         4             48              0            159
TeX                              1             13              0            118
Fortran 90                       1             33             48            105
C                                1              3              1             21
-------------------------------------------------------------------------------
SUM:                             8            308            404           1665
-------------------------------------------------------------------------------

Statistical information for the repository '2aac71164be070f563ed7735' was
gathered on 2021/07/14.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Lutz                             2            31              2           67.35
Lutz Hamel                       2             5              9           28.57
Meiger00                         1             1              1            4.08

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Lutz Hamel                   24          480.0          0.0                4.17
Meiger00                      1          100.0          4.6                0.00

whedon commented 3 years ago

Failed to discover a valid open source license.

whedon commented 3 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.18637/jss.v087.i07 is OK

MISSING DOIs

- 10.1016/j.cageo.2018.06.006 may be a valid DOI for title: Interpreting Self-Organizing Map errors in the classification of ocean patterns
- 10.1093/mnras/279.1.293 may be a valid DOI for title: Star/galaxy classification using Kohonen self-organizing maps
- 10.1007/978-3-030-01057-7_60 may be a valid DOI for title: VSOM: Efficient, Stochastic Self-Organizing Map Training
- 10.1007/978-3-319-28518-4_4 may be a valid DOI for title: Som quality measures: An efficient statistical approach

INVALID DOIs

- None

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

kthyng commented 3 years ago

Hi @fabian-s are you interested in editing this submission?

kthyng commented 3 years ago

@whedon invite @fabian-s as editor

whedon commented 3 years ago

@fabian-s has been invited to edit this submission.

fabian-s commented 3 years ago

Sure, I can manage this one.

kthyng commented 3 years ago

@whedon assign @fabian-s as editor

whedon commented 3 years ago

OK, the editor is @fabian-s

kthyng commented 3 years ago

(@fabian-s you can run this command yourself too, just fyi)

fabian-s commented 3 years ago

@lutzhamel:

Your package currently does not follow many of the coding standards JOSS aims for (see below for the full output of goodpractice::gp("popsom")) -- most importantly:

please add automated tests with decent coverage for the core functionality
please add proper dependency management ("Imports:", "Suggests:", ...) and namespace directives to the package

In addition, before we try to recruit reviewers, please:

add the missing DOIs (see whedon bot output above) to your paper
use a suitable LICENSE for the package
add a CONTRIBUTING file and/or community guidelines to the repository (see also https://joss.readthedocs.io/en/latest/review_criteria.html)

> goodpractice::gp("popsom")
[...]
It is good practice to

  ✖ write unit tests for all functions, and all package code in
    general. 0% of code lines are covered by test cases.

    R/map-utils.R:67:NA
    R/map-utils.R:68:NA
    R/map-utils.R:70:NA
    R/map-utils.R:71:NA
    R/map-utils.R:73:NA
    ... and 1048 more lines

  ✖ write short and simple functions. These functions have high
    cyclomatic complexity:compute.centroids (98).
  ✖ omit "Date" in DESCRIPTION. It is not required and it gets
    invalid quite often. A build date will be added to the package when
    you perform `R CMD build` on it.
  ✖ add a "BugReports" field to DESCRIPTION, and point it to a
    bug tracker. Many online code hosting services provide bug trackers
    for free, https://github.com, https://gitlab.com, etc.
  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/map-utils.R:18:1
    R/map-utils.R:26:1
    R/map-utils.R:27:1
    R/map-utils.R:34:1
    R/map-utils.R:241:1
    ... and 27 more lines

  ✖ omit trailing semicolons from code lines. They are not
    needed and most R coding standards forbid them

    R/map-utils.R:256:32

  ✖ avoid sapply(), it is not type safe. It might return a
    vector, or a list, depending on the input data. Consider using
    vapply() instead.

    R/map-utils.R:73:12
    R/map-utils.R:559:22
    R/map-utils.R:595:25
    R/map-utils.R:1640:12
    R/map-utils.R:1759:22

  ✖ avoid the library() and require() functions, they change
    the global search path. If you need to use other packages, import
    them. If you need to load them explicitly, then consider
    loadNamespace() instead, or as a last resort, declare them as
    'Depends' dependencies.

    R/map-utils.R:39:1
    R/map-utils.R:40:1
    R/map-utils.R:41:1
    R/map-utils.R:42:1

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...)
    and 1:NCOL(...) expressions. They are error prone and result 1:0 if
    the expression on the right hand side is zero. Use seq_len() or
    seq_along() instead.

    R/map-utils.R:406:15
    R/map-utils.R:475:13
    R/map-utils.R:492:22
    R/map-utils.R:495:15
    R/map-utils.R:549:22
    ... and 10 more lines

  ✖ not import packages as a whole, as this can cause name
    clashes between the imported packages. Instead, import only the
    specific functions you need.

Looking forward to working with you on this submission, best, Fabian

lutzhamel commented 3 years ago

@fabian-s Interesting automated feedback for the package. Quick question: The R release process requires you to embed vignettes into the documentation. These vignettes are treated by the R release process as automated test cases and the release process will terminate if these cases do not succeed. Neither will the release process continue if the documentation does not include these vignettes. Now, do I have to replicate this functionality with an additional test suite?

fabian-s commented 3 years ago

@lutzhamel

I don't agree with the implied premise that vignettes, which are part of your documentation, (EDIT for precision:) ~~are equivalent to~~ "replicate the functionality" of regression or unit tests or can replace them.

I do concede that, by having vignettes that are being rebuilt whenever the package is bundled, you will frequently discover if something breaks really badly, but not in a systematic or structured way (e.g. you'll know about the first bug that breaks your vignette, but not subsequent ones. Or, if a buggy function still runs but returns a wrong result, the vignette will render just fine with nonsensical content).

Turning examples in your vignettes or help pages into unit tests can be a good starting point if they run quickly, cover the API of your package and each focus on separate functionalities -- for additional considerations on what to test and how, see for example section 7 in https://cran.r-project.org/web/packages/tinytest/vignettes/using_tinytest.pdf or Cotton: Testing R Code for more in-depth info.

lutzhamel commented 3 years ago

@fabian-s Got it. Thanks for the links, very helpful. Another quick question, is there a way for me to run the gp() function manually in order to check that my code conforms to the joss standards?

Thanks, Lutz

fabian-s commented 3 years ago

Sure, glad it was helpful.

re. gp():
not sure I really understand what you're asking -- to use it you install the goodpractice package from CRAN and then call its gp()-function with the folder path to the source R package you want to check.

Note that gp() does not define/implement JOSS standards for submitted R code, though, I didn't mean to create that impression -- it's just a quick and easy way to find some low-hanging fruit for possible improvements.

What JOSS is looking for in a paper is described at https://joss.readthedocs.io/en/latest/submitting.html and https://joss.readthedocs.io/en/latest/review_criteria.html.

lutzhamel commented 3 years ago

Thanks for getting back to me so quickly. I didn't realize that gp() was part of a CRAN package. I will follow up.

Best, Lutz

lutzhamel commented 3 years ago

@fabian-s

I believe I have addressed all the issues raised and adjusted the repository appropriately.

please add proper dependency management ("Imports:", "Suggests:", ...) and namespace directives to the package

done. The DESCRIPTION file has an "Imports:" section for the R package manager.
add the missing DOIs (see whedon bot output above) to your paper

done.
use a suitable LICENSE for the package

done -- GPL-3
add a CONTRIBUTING file and/or community guidelines to the repository

done
please add automated tests with decent coverage for the core functionality ✖ write unit tests for all functions, and all package code in general. 0% of code lines are covered by test cases.

R/map-utils.R:67:NA R/map-utils.R:68:NA R/map-utils.R:70:NA R/map-utils.R:71:NA R/map-utils.R:73:NA ... and 1048 more lines

Done. My unit tests cover about 75% of the code.

✖ write short and simple functions. These functions have high cyclomatic complexity:compute.centroids (98).

I did not follow this advice. The function mentioned has to test A LOT of boundary conditions and therefore the cyclomatic complexity is high. Breaking the function into smaller parts would render the algorithm behind the function virtually incomprehensible.

✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite often. A build date will be added to the package when you perform R CMD build on it.

done

✖ add a "BugReports" field to DESCRIPTION, and point it to a bug tracker. Many online code hosting services provide bug trackers for free, https://github.com, https://gitlab.com, etc.

done

✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than 80 characters

R/map-utils.R:18:1
R/map-utils.R:26:1
R/map-utils.R:27:1
R/map-utils.R:34:1
R/map-utils.R:241:1
... and 27 more lines

done

✖ omit trailing semicolons from code lines. They are not needed and most R coding standards forbid them

R/map-utils.R:256:32

Done

✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead.

R/map-utils.R:73:12
R/map-utils.R:559:22
R/map-utils.R:595:25
R/map-utils.R:1640:12
R/map-utils.R:1759:22

I did not follow this advice. R is not type safe, see https://stackoverflow.com/questions/14298899/type-safety-of-the-r-language#:~:text=R%20has%20essentially%20no%20type,end%20up%20with%20meaningless%20objects for examples. However, R is dynamically typed. The advantage of dynamically typed languages is the readability of the code. sapply is the most readable and intuitive of the apply family of functions in R. I do not want to sacrifice readability for spotty type safety.

✖ avoid the library() and require() functions, they change the global search path. If you need to use other packages, import them. If you need to load them explicitly, then consider loadNamespace() instead, or as a last resort, declare them as 'Depends' dependencies.

R/map-utils.R:39:1
R/map-utils.R:40:1
R/map-utils.R:41:1
R/map-utils.R:42:1

Done

✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on the right hand side is zero. Use seq_len() or seq_along() instead.

R/map-utils.R:406:15
R/map-utils.R:475:13
R/map-utils.R:492:22
R/map-utils.R:495:15
R/map-utils.R:549:22
... and 10 more lines

I did not follow this advice. 1:length() is an idiomatic expression. Any programmer reading this code immediately understands what is going on, e.g for (i in 1:length(x))... This is not true for seq_len. The name is unfortunate and it is not clear what for (i in seq_len(length(x)))... really does unless you start digging into the documentation. The safety improvements are minimal and do not outweigh the readability concerns of the code.

✖ not import packages as a whole, as this can cause name clashes between the imported packages. Instead, import only the specific functions you need.

I did not follow this advice. There are no name clashes between the imported packages. Should there be in the future this will be addressed.

fabian-s commented 3 years ago

@lutzhamel thanks, sounds reasonable!

fabian-s commented 3 years ago

@whedon generate pdf

fabian-s commented 3 years ago

:wave: @rcannood & @HerrMo, would any of you be willing to review this submission for JOSS? We carry out our checklist-driven reviews here in GitHub issues and follow these guidelines: https://joss.readthedocs.io/en/latest/review_criteria.html

rcannood commented 3 years ago

I would love to review this submission as the topic interests me greatly. However, I'm currently out of office and would only be back on the 2nd of august; I can probably finish reviewing by August 7th. Would that be okay?

fabian-s commented 3 years ago

@rcannood sure, that's fine! thanks for taking this on.

fabian-s commented 3 years ago

@whedon assign @rcannood as reviewer

whedon commented 3 years ago

OK, @rcannood is now a reviewer

HerrMo commented 3 years ago

@fabian-s I am happy to review the submission

fabian-s commented 3 years ago

thanks for taking this on, @HerrMo

fabian-s commented 3 years ago

@whedon add @HerrMo as reviewer

whedon commented 3 years ago

OK, @HerrMo is now a reviewer

fabian-s commented 3 years ago

@whedon start review

whedon commented 3 years ago

OK, I've started the review over in https://github.com/openjournals/joss-reviews/issues/3524.

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

openjournals / joss-reviews

[PRE REVIEW]: popsom: A Very Efficient Implementation of Self-Organizing Maps with Starburst Visualizations for R #3491

Status