[REVIEW]: popsom: A Very Efficient Implementation of Self-Organizing Maps with Starburst Visualizations for R

whedon commented 3 years ago

Submitting author: @lutzhamel (Lutz Hamel) Repository: https://github.com/lutzhamel/popsom Version: v5.2 Editor: @fabian-s Reviewer: @rcannood, @HerrMo Archive: Pending

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/73da2a16978d49e69b6e71f74df66a00"><img src="https://joss.theoj.org/papers/73da2a16978d49e69b6e71f74df66a00/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/73da2a16978d49e69b6e71f74df66a00/status.svg)](https://joss.theoj.org/papers/73da2a16978d49e69b6e71f74df66a00)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@rcannood & @HerrMo, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @fabian-s know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Review checklist for @rcannood

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@lutzhamel) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @HerrMo

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@lutzhamel) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @rcannood, @HerrMo it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

fabian-s commented 3 years ago

👋🏼 @HerrMo @rcannood this is the review thread for the paper. All of our communications will happen here from now on.

Both reviewers have checklists at the top of this thread with the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#3524 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use Whedon (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me if you have any questions/concerns.

fabian-s commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

Wordcount for paper.md is 815

whedon commented 3 years ago

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.02 s (742.3 files/s, 103958.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
R                               10            224            384           1351
Markdown                         5             46              0            170
TeX                              1             13              0            122
Fortran 90                       1             33             48            105
C                                1              3              1             21
-------------------------------------------------------------------------------
SUM:                            18            319            433           1769
-------------------------------------------------------------------------------

Statistical information for the repository '61f2a4e99f273108875321ff' was
gathered on 2021/07/22.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Lutz                             2            31              2           67.35
Lutz Hamel                       2             5              9           28.57
Meiger00                         1             1              1            4.08

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Lutz Hamel                   24          480.0          0.0                4.17
Meiger00                      1          100.0          4.6                0.00

whedon commented 3 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/j.cageo.2018.06.006 is OK
- 10.1093/mnras/279.1.293 is OK
- 10.18637/jss.v087.i07 is OK
- 10.1007/978-3-030-01057-7_60 is OK
- 10.1007/978-3-319-28518-4_4 is OK

MISSING DOIs

- None

INVALID DOIs

- None

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon commented 3 years ago

:wave: @HerrMo, please update us on how your review is going (this is an automated reminder).

whedon commented 3 years ago

:wave: @rcannood, please update us on how your review is going (this is an automated reminder).

fabian-s commented 3 years ago

@rcannood @HerrMo, please comment on the state of your reviews.

If you have identified major problems that are blocking acceptance of this submission, please open issues at https://github.com/lutzhamel/popsom and link them here.

HerrMo commented 3 years ago

@fabian-s @whedon I have finished my review and I suggest a revision as follows

Summary: The package popsom provides an implementation of self-organizing maps. The software is well documented, easy to use and provides visualizations and quality metrics which make the assessment of a trained map very intuitive. The software paper is well written and gives an accurate and for the most part sufficient overview of the package and the problems it solves. However, several issues remain that require revision and/or clarification, thus blocking acceptance.

Major issues:

Minor issues:

fabian-s commented 3 years ago

@rcannood, please comment on the state of your review.

lutzhamel commented 3 years ago

@HerrMo Thank you for your insightful and detailed review. Addressing the issues you raised certainly made the package a better package. I have addressed all the issues you raised above (see the repo for detailed comments). In the process I had to create a new version of popsom : v 6.0. This version is now available in the repository. I have not pushed it out to CRAN yet just in case there are additional items I need to address. You can build the package locally on your machine by following the instructions here. The R check procedure builds a package locally which you can install and run in your R environment. Thanks again and please let me know if there is anything else I need to address.

lutzhamel commented 3 years ago

Sorry @fabian-s , I should have mentioned you in my response above...

lutzhamel commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

lutzhamel commented 3 years ago

@fabian-s @HerrMo @rcannood I have generated a new pdf due to the fact that the paper had some major rewrites...

lutzhamel commented 3 years ago

@HerrMo @fabian-s I noticed that the only bullet point that has not been not checked off in @HerrMo review is the automated tests point. You can find information on how to execute automated tests for popsom here.

fabian-s commented 3 years ago

thanks @lutzhamel, we're still waiting for @rcannood to chime in. in the meantime, could you reduce the sizes of figures 1 and 2 so they fit on one page?

lutzhamel commented 3 years ago

Hello @fabian-s. I reduced the size and will generate a new pdf.

lutzhamel commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

rcannood commented 3 years ago

Sorry for taking a while to respond. After getting back from holidays, the last few weeks have been quite hectic and I wanted to make sure I took sufficient time to properly go through the manuscript and the code before ticking off any boxes.

I appreciate the valuable contribution @lutzhamel has made in providing an efficient implementation of SOM in R. However, I feel that a few issues need to be resolved before I can approve this manuscript for publication. A few issues are included below, whereas others are posted as separate issues in the lutzhamel/popsom repository.

Contribution and authorship

Michael Eiger (@Meiger00) also created a significant number of commits to the repository. Should Michael also be an author on the manuscript? Similarly, the package description lists even more authors. Should these also be included on the manuscript as well?

Substantial scholarly effort

The description paragraph mentions that the popsom package has a number of distinguishing features, most of which are already part of a separate publication by the author (Hamel 2018, Hamel&Brown 2011, Hamel 2016 and Tatoian&Hamel 2018). I believe this publication is valuable because it could provide a nice summary of the package functionality in an open journal (whereas some of the aforementioned publications are not). In light of this, can @lutzhamel please clarify what the substantial scientific effort of this submission is?

Major issues

lutzhamel/popsom#13
lutzhamel/popsom#14
lutzhamel/popsom#15
lutzhamel/popsom#16
lutzhamel/popsom#17
lutzhamel/popsom#18

lutzhamel commented 3 years ago

@fabian-s I don't agree with any of the above criticisms of my paper and therefore I am retracting my paper. According to the JOSS specs, JOSS papers are supposed short introductions to software packages, these criticism are trying to turn this into an academic treatment. All these issues have been addressed in my publications (hence the number of self-citations), except "improve quality of writing" which I find kind of insulting and the "split up package" which I find bizarre. This package predates roxygen and I don't see any benefit of retrofitting my package to this tool.

rcannood commented 3 years ago

Please do not interpret my comments as harsh criticisms, as I only intended to improve the readability of the package and the manuscript. If you do not agree with certain comments, I kindly invite you to simply respond to my comment, as you have done for several of my comments. For instance:

This package predates roxygen and I see no benefit from retrofitting my code to this tool.

I saw and understood that this tool predates roxygen, which is why I did not make a separate GitHub issue and simply noted "Please consider generating the man/* files using roxygen2."

fabian-s commented 3 years ago

@lutzhamel very sorry to hear that -- in my view, @rcannood has provided much valuable feedback and the few remaining issues that are blocking acceptance (as opposed to being mere suggestions for possible improvements to your code or the paper) should be straightforward to adress, I think. I've added comments to the respective issues that rcannood has opened, I think we can work this out fairly easily if you're still willing.

lutzhamel commented 3 years ago

@fabian-s Thanks for taking a look at the issues raised. Let me look at your comments and I will get back to you.

lutzhamel commented 3 years ago

@fabian-s From your comments it seems that I have misinterpreted some of the issues raised by @rcannood, for example, "provide an explanation of how SOM's work" I took as a detailed explanation of SOMs which clearly would be beyond the scope of a short software paper. Your interpretation as a brief sentence or two as a way to provide context to readers not familiar with the topic seems reasonable to me. My apologies to @rcannood if I misunderstood the intention of those comments. I do appreciate the time you took to carefully review the work!

Given this, I agree with @fabian-s that the paper can be brought up to par without a lot of additional verbiage and work. A recurring theme in the reviews is the performance. I redid the performance analysis from 3 years ago and at this point can no longer reproduce the 60x speedup. The highest speedup I can achieve now is 20x. There are most likely two reasons for that: (1) in the past two years or so the unix world has switched from gcc to clang as its default compiler which most likely has many more aggressive optimizations based on llvm than gcc and this diminished the performance gap between fortran and c (2) hardware has progressed, larger caches, faster clocks which might have had a disproportional impact on C code compared to Fortran code. I will adjust the claims in the paper and the software package to bring them in line with the current findings.

The remaining issues I will address in the coming days.

Regards.

lutzhamel commented 3 years ago

I will address the points raised by @rcannood outside of the popsom repository:

Contribution and authorship Michael Eiger (@Meiger00) also created a significant number of commits to the repository. Should Michael also be an author on the manuscript? Similarly, the package description lists even more authors. Should these also be included on the manuscript as well?

I am the sole author of this paper which I consider to be different and distinct of the authorship of package which includes a number of folks. However, I am also the main contributor, architect, and maintainer of the package. I do acknowledge the contributions to the package in the paper. I suppose, ultimately it is an editorial decision of how precisely authorship of the paper should be represented.

Substantial scholarly effort The description paragraph mentions that the popsom package has a number of distinguishing features, most of which are already part of a separate publication by the author (Hamel 2018, Hamel&Brown 2011, Hamel 2016 and Tatoian&Hamel 2018). I believe this publication is valuable because it could provide a nice summary of the package functionality in an open journal (whereas some of the aforementioned publications are not). In light of this, can @lutzhamel please clarify what the substantial scientific effort of this submission is?

All the papers that @rcannood mentions in the above paragraph focus on various, distinct facets of our package. This paper pulls all these different strands of research together. Substantial scientific effort has been expended over the better part of the last decade developing this software to the point where it is now and this paper presents the package as a whole rather than any specific aspect of it.

rcannood commented 3 years ago

In response to your earlier message: I apologise for some of the phrasing used in my initial comments; I think I could have done a better job of communicating the constructive part of my messages without coming across as critical as I did.

In response to your last message: Thanks for clarifying the matters regarding authorship and scholarly effort. Note that I asked about this specifically because whedon asked me to look into this.

Looking forward to the updated manuscript!

fabian-s commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

lutzhamel commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

lutzhamel commented 3 years ago

@fabian-s @rcannood @HerrMo I have generated a new PDF and committed all my changes to the repo. Please take a look and let me know what you think. Thanks!

rcannood commented 3 years ago

Thanks! I feel that the changes have sufficiently addressed my previous concerns :+1:

fabian-s commented 3 years ago

@whedon generate pdf

fabian-s commented 3 years ago

@whedon check references

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon commented 3 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/j.cageo.2018.06.006 is OK
- 10.1093/mnras/279.1.293 is OK
- 10.18637/jss.v087.i07 is OK
- 10.1007/978-3-030-01057-7_60 is OK
- 10.1007/978-3-319-28518-4_4 is OK

MISSING DOIs

- None

INVALID DOIs

- None

fabian-s commented 3 years ago

@rcannood @HerrMo thanks for your dilligent and constructive reviews. I will recommend acceptance of this paper now.

fabian-s commented 3 years ago

@lutzhamel Thank you for engaging so swiftly with the reviewers' remarks.

At this point could you:

[ ] Make a tagged release of your software, and list the version tag of the archived version here.
[ ] Archive the reviewed software in Zenodo or a similar service (e.g., figshare, an institutional repository)
[ ] Check the archival deposit (e.g., in Zenodo) has the correct metadata. This includes the title (should match the paper title) and author list (make sure the list is correct and people who only made a small fix are not on it). You may also add the authors' ORCID.
[ ] list the DOI of the archived version here.

I can then move forward with accepting the submission.

lutzhamel commented 3 years ago

@fabian-s Problems! I was doing my final due diligence on the paper which included carefully reviewing the performance data when I noticed that I had misunderstood a parameter setting on the kohonen package. After fixing that mistake and rerunning the benchmarks I got the following results,

Iris Data
  package        mean time          speedup
1  popsom       215.107746                1
2     som 4074.47307966667  18.941545134626
3 kohonen 312.952335666667 1.45486316269925

Epil Data
  package        mean time          speedup
1  popsom        399.20453                1
2     som      4731.625202 11.8526340420035
3 kohonen 456.656295666667 1.14391561555343

Wines Data
  package        mean time          speedup
1  popsom       622.578688                1
2     som      5463.562089 8.77569726415049
3 kohonen 651.237032333333 1.04603168223666

Synthetic Data
  package        mean time          speedup
1  popsom 15.4396670093333                1
2     som 163.425318395667 10.5847696259819
3 kohonen 16.8341376496667 1.09031740383328

which implies that the kohonen package is pretty much on par with our popsom package performance wise.

What are my options? I could withdraw the paper because one of its claims is no longer true. I could rewrite the paper with a concentration on ease of use and interpretability of results which was one of the major angles of our research besides performance.

Please let me know how you would like to proceed.

Kind Regards, Lutz

fabian-s commented 3 years ago

@lutzhamel hmm.... This is only the 2nd submission I'm handling for JOSS, so I don't feel comfortable deciding this myself. I'll discuss with the JOSS team and get back to you next week.

fabian-s commented 3 years ago

@whedon commands

whedon commented 3 years ago

Here are some things you can ask me to do:

# List all of Whedon's capabilities
@whedon commands

# Assign a GitHub user as the sole reviewer of this submission
@whedon assign @username as reviewer

# Add a GitHub user to the reviewers of this submission
@whedon add @username as reviewer

# Re-invite a reviewer (if they can't update checklists)
@whedon re-invite @username as reviewer

# Remove a GitHub user from the reviewers of this submission
@whedon remove @username as reviewer

# List of editor GitHub usernames
@whedon list editors

# List of reviewers together with programming language preferences and domain expertise
@whedon list reviewers

# Change editorial assignment
@whedon assign @username as editor

# Set the software archive DOI at the top of the issue e.g.
@whedon set 10.0000/zenodo.00000 as archive

# Set the software version at the top of the issue e.g.
@whedon set v1.0.1 as version

# Open the review issue
@whedon start review

EDITORIAL TASKS

# All commands can be run on a non-default branch, to do this pass a custom 
# branch name by following the command with `from branch custom-branch-name`.
# For example:

# Compile the paper
@whedon generate pdf

# Compile the paper from alternative branch
@whedon generate pdf from branch custom-branch-name

# Remind an author or reviewer to return to a review after a
# certain period of time (supported units days and weeks)
@whedon remind @reviewer in 2 weeks

# Ask Whedon to do a dry run of accepting the paper and depositing with Crossref
@whedon recommend-accept

# Ask Whedon to check the references for missing DOIs
@whedon check references

# Ask Whedon to check repository statistics for the submitted software
@whedon check repository

EiC TASKS

# Invite an editor to edit a submission (sending them an email)
@whedon invite @editor as editor

# Reject a paper
@whedon reject

# Withdraw a paper
@whedon withdraw

# Ask Whedon to actually accept the paper and deposit with Crossref
@whedon accept deposit=true

fabian-s commented 3 years ago

@lutzhamel

Considering the substantial visualization capabilities and extensive documentation of kohonen, we think it's unlikely that such a rewrite would demonstrate a sufficiently substantial contribution to the available open source software in this area or that it is likely to be used and cited widely.

My recommendation would be to withdraw the paper at this point.

lutzhamel commented 3 years ago

Agreed.

lutzhamel commented 3 years ago

@whedon withdraw

openjournals / joss-reviews