openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
697 stars 36 forks source link

[PRE REVIEW]: AutoGMM: Automatic and hierarchical gaussian mixture modeling in Python #2921

Closed whedon closed 3 years ago

whedon commented 3 years ago

Submitting author: @tliu68 (Tingshan Liu) Repository: https://github.com/tliu68/graspologic Version: v0.3 Editor: Pending Reviewer: Pending Managing EiC: Kristen Thyng

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Author instructions

Thanks for submitting your paper to JOSS @tliu68. Currently, there isn't an JOSS editor assigned to your paper.

The author's suggestion for the handling editor is @arokem.

@tliu68 if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:

@whedon commands
whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 3 years ago

Failed to discover a Statement of need section in paper

whedon commented 3 years ago
Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=2.31 s (69.8 files/s, 12081.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          90           3179           5797          10619
Jupyter Notebook                23              0           4306           1180
Markdown                         9            239              0            928
reStructuredText                28            278            185            533
TeX                              2             48              0            376
YAML                             3              1              0            131
make                             2             12              4             26
INI                              2              2              0             10
HTML                             1              0              0              9
Bourne Shell                     1              2              0              7
-------------------------------------------------------------------------------
SUM:                           161           3761          10292          13819
-------------------------------------------------------------------------------

Statistical information for the repository '5b804341b13f48bbf7665023' was
gathered on 2020/12/18.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Alex Loftus                      3             4              9            0.03
Ali Saad-Eldin                  11           957            206            3.02
Anshu Trivedi                    1             4              9            0.03
Benjamin Pedigo                 29          5287            835           15.89
Bijan Varjavand                  3           972            134            2.87
Casey Weiner                     2            42             20            0.16
Daniel Borders                   1             6              0            0.02
Dwayne Pryce                     8          1926            539            6.40
Eric Bridgeford                 21          1536            217            4.55
Iain Carmichael                  1            38              1            0.10
Jaewon Chung                    32          5337           2519           20.39
Jingyan230                       1            12              3            0.04
Jinhan                           1             4              0            0.01
Kikiwink                         2            50             38            0.23
Paul Adkisson                    2           235            223            1.19
PerifanosPrometheus              1             1              1            0.01
Ronan Perry                      1           321              7            0.85
ShanQiu                          2           710              0            1.84
Shiyu Sun                        1            20              2            0.06
Thomas Athey                     1          1062              1            2.76
Vikram Chandrashekha             1            72              8            0.21
Ze Ou                            1            13              0            0.03
alyakin314                       3          1718            276            5.17
bdpedigo                        44          1469            554            5.25
bvarjavand                       1            85              0            0.22
dfrancisco1998                   1            43              3            0.12
eric bridgeford                  2           130              2            0.34
gkang7                           1           269              0            0.70
hhelm10                          1           267             59            0.85
j1c                             73          1640            755            6.21
jdey4                            1             7              2            0.02
jheiko1                          4            95             40            0.35
kareef928                        1            17              4            0.05
spencer-loggia                   1           148             35            0.47
tliu68                           2          6052           1487           19.56

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Alex Loftus                 403        10075.0          0.7                5.71
Ali Saad-Eldin             1096          114.5          3.2                5.38
Anshu Trivedi                 4          100.0          1.8                0.00
Benjamin Pedigo            3957           74.8         18.9                3.84
Bijan Varjavand             322           33.1         21.5                7.45
Casey Weiner                 42          100.0          6.8                4.76
Daniel Borders                3           50.0          2.7                0.00
Dwayne Pryce               3048          158.3          1.3                7.87
Eric Bridgeford             537           35.0         25.6               16.20
Iain Carmichael              38          100.0         15.3                0.00
Jaewon Chung               4520           84.7         22.0                6.84
Jingyan230                   10           83.3          2.7                0.00
Jinhan                        1           25.0          9.4                0.00
Kikiwink                     44           88.0         17.1                0.00
Paul Adkisson               233           99.1          1.3                0.00
Ronan Perry                 258           80.4         20.7               10.08
ShanQiu                     683           96.2          9.6                6.30
Shiyu Sun                    18           90.0         15.7                0.00
Thomas Athey                913           86.0         12.2                6.90
Vikram Chandrashekha         61           84.7         21.4                3.28
Ze Ou                        10           76.9          2.6                0.00
alyakin314                 1362           79.3          2.6               12.41
dfrancisco1998              341          793.0          0.4                7.62
gkang7                      260           96.7         11.8                2.31
hhelm10                     201           75.3         20.0                7.96
jdey4                         7          100.0         14.5                0.00
jheiko1                      82           86.3         11.7                0.00
kareef928                   441         2594.1          0.2                1.36
spencer-loggia              145           98.0          2.7                2.07
tliu68                      554            9.2          1.6               12.27
whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- None

MISSING DOIs

- 10.2307/2532201 may be a valid DOI for title: Model-Based Gaussian and Non-Gaussian Clustering
- 10.1080/01621459.1998.10474110 may be a valid DOI for title: Detecting Features in Spatial Point Processes with Clutter via Model-Based Clustering
- 10.21236/ada454825 may be a valid DOI for title: Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering
- 10.32614/rj-2016-021 may be a valid DOI for title: mclust 5: clustering, classification and density estimation using Gaussian finite mixture models
- 10.1016/s0167-9473(02)00163-9 may be a valid DOI for title: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models
- 10.1016/j.patrec.2009.09.011 may be a valid DOI for title: Data Clustering: 50 Years Beyond K-means
- 10.1093/comjnl/9.4.373 may be a valid DOI for title: A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems
- 10.1016/s0306-4379(01)00008-4 may be a valid DOI for title: CURE: An Efficient Clustering Algorithm for Large Databases
- 10.1093/oso/9780198505044.003.0017 may be a valid DOI for title: Finite mixture models
- 10.1515/revneuro-2018-0050 may be a valid DOI for title: Segmentation and clustering in brain MRI imaging
- 10.1177/109634800002400105 may be a valid DOI for title: An Explanation and Illustration of Cluster Analysis for Identifying Hospitality Market Segments
- 10.1080/01431160512331316432 may be a valid DOI for title: Satellite image classification using genetically guided fuzzy clustering with spatial information
- 10.1109/tpami.2017.2679100 may be a valid DOI for title: Clustering Millions of Faces by Identity
- 10.1016/b978-0-12-442450-0.50009-2 may be a valid DOI for title: Clustering and Structural Balance in Graphs
- 10.1016/s0047-259x(03)00096-4 may be a valid DOI for title: A well-conditioned estimator for large-dimensional covariance matrices
- 10.1007/978-1-4612-1694-0_16 may be a valid DOI for title: A new look at the statistical model identification
- 10.1093/comjnl/26.4.354 may be a valid DOI for title: A survey of recent advances in hierarchical clustering algorithms
- 10.1109/sffcs.1999.814639 may be a valid DOI for title: Learning mixtures of Gaussians
- 10.1016/j.jcss.2003.11.008 may be a valid DOI for title: A spectral algorithm for learning mixture models
- 10.1007/11503415_31 may be a valid DOI for title: On spectral learning of mixtures of distributions

INVALID DOIs

- None
whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

kthyng commented 3 years ago

Hi @tliu68 and thanks for your submission. I am confused about a couple of things.

  1. the repo associated with your submission has a completely different name than this submission. Was this a mistake?
  2. your paper is far too long. Please review the paper requirements.

I am going to add the paused label to this because I am not sure how long it will take to address these initial questions.

tliu68 commented 3 years ago

Hello @kthyng! Thank you for your reply! Here are my answers to your questions:

  1. No, it was not a mistake. Our paper presents AutoGMM, a major clustering algorithm of graspologic, and its hierarchical version, HGMM (also in graspologic). The repo is a fork of graspologic and contains our paper. Do you think changing its name and/or README would make it more obvious?
  2. We are working on trimming down our paper based on the requirements and will submit soon!
tliu68 commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

tliu68 commented 3 years ago

Hello @kthyng! Thank you for your reply! Here are my answers to your questions:

  1. No, it was not a mistake. Our paper presents AutoGMM, a major clustering algorithm of graspologic, and its hierarchical version, HGMM (also in graspologic). The repo is a fork of graspologic and contains our paper. Do you think changing its name and/or README would make it more obvious?
  2. We are working on trimming down our paper based on the requirements and will submit soon!

Hello @kthyng! The paper has been updated. What would you suggest that we could modify the repo? Thank you very much for your help!

kthyng commented 3 years ago

@tliu68 I'm sorry it's taken me awhile to get back to you on this. Ok, so do you mean that graspologic is a larger package and your JOSS submission is about a subset of the larger package? If so, it is important to be extra clear in all of your materials what exactly reviewers would be reviewing as part of this submission.

kthyng commented 3 years ago

@tliu68 Can you help summarize the part of the repository that is in this submission? As you can see from the summary above coming out of cloc to count the lines of code, it is showing the full repo instead of the subset you are wanting to be reviewed. What is the subset of the code base that should count as in this submission?

tliu68 commented 3 years ago

@whedon commands

whedon commented 3 years ago

Here are some things you can ask me to do:

# List Whedon's capabilities
@whedon commands

# List of editor GitHub usernames
@whedon list editors

# List of reviewers together with programming language preferences and domain expertise
@whedon list reviewers

EDITORIAL TASKS

# Compile the paper
@whedon generate pdf

# Compile the paper from alternative branch
@whedon generate pdf from branch custom-branch-name

# Ask Whedon to check the references for missing DOIs
@whedon check references

# Ask Whedon to check repository statistics for the submitted software
@whedon check repository
tliu68 commented 3 years ago

@whedon generate pdf

tliu68 commented 3 years ago

@whedon check references

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.32614/RJ-2016-021 is OK
- 10.1016/j.patrec.2009.09.011 is OK
- 10.1038/nature23455 is OK
- 10.1214/AOS/1176344136 is OK

MISSING DOIs

- 10.2307/2532201 may be a valid DOI for title: Model-Based Gaussian and Non-Gaussian Clustering
- 10.1080/01621459.1998.10474110 may be a valid DOI for title: Detecting Features in Spatial Point Processes with Clutter via Model-Based Clustering
- 10.21236/ada454825 may be a valid DOI for title: Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering
- 10.1016/s0167-9473(02)00163-9 may be a valid DOI for title: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models
- 10.1093/comjnl/9.4.373 may be a valid DOI for title: A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems
- 10.1016/s0306-4379(01)00008-4 may be a valid DOI for title: CURE: An Efficient Clustering Algorithm for Large Databases
- 10.1093/oso/9780198505044.003.0017 may be a valid DOI for title: Finite mixture models
- 10.1515/revneuro-2018-0050 may be a valid DOI for title: Segmentation and clustering in brain MRI imaging
- 10.1177/109634800002400105 may be a valid DOI for title: An Explanation and Illustration of Cluster Analysis for Identifying Hospitality Market Segments
- 10.1080/01431160512331316432 may be a valid DOI for title: Satellite image classification using genetically guided fuzzy clustering with spatial information
- 10.1109/tpami.2017.2679100 may be a valid DOI for title: Clustering Millions of Faces by Identity
- 10.1016/b978-0-12-442450-0.50009-2 may be a valid DOI for title: Clustering and Structural Balance in Graphs
- 10.1016/s0047-259x(03)00096-4 may be a valid DOI for title: A well-conditioned estimator for large-dimensional covariance matrices
- 10.1007/978-1-4612-1694-0_16 may be a valid DOI for title: A new look at the statistical model identification
- 10.1093/comjnl/26.4.354 may be a valid DOI for title: A survey of recent advances in hierarchical clustering algorithms
- 10.1109/sffcs.1999.814639 may be a valid DOI for title: Learning mixtures of Gaussians
- 10.1016/j.jcss.2003.11.008 may be a valid DOI for title: A spectral algorithm for learning mixture models
- 10.1007/11503415_31 may be a valid DOI for title: On spectral learning of mixtures of distributions

INVALID DOIs

- 10.2307/3085676 is INVALID
tliu68 commented 3 years ago

@whedon generate pdf

tliu68 commented 3 years ago

@whedon check references

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.32614/RJ-2016-021 is OK
- 10.1016/j.patrec.2009.09.011 is OK
- 10.1038/nature23455 is OK
- 10.1214/AOS/1176344136 is OK

MISSING DOIs

- None

INVALID DOIs

- 10.2307/3085676 is INVALID
whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

tliu68 commented 3 years ago

@whedon check references

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1198/016214502760047131 is OK
- 10.32614/RJ-2016-021 is OK
- 10.1016/j.patrec.2009.09.011 is OK
- 10.1038/nature23455 is OK
- 10.1214/AOS/1176344136 is OK

MISSING DOIs

- None

INVALID DOIs

- None
tliu68 commented 3 years ago

@kthyng Hello! Our main code is in

graspologic/graspologic/cluster/autogmm.py
graspologic/tests/cluster/test_autogmm.py
graspologic/graspologic/cluster/divisive_cluster.py
graspologic/tests/cluster/test_divisive_cluster.py

I ran cloc for those files and got

files,language,blank,comment,code,"github.com/AlDanial/cloc v 1.88 T=0.05 s (82.5 files/s, 37701.6 lines/s)" 4,Python,338,462,1028 4,SUM,338,462,1028

kthyng commented 3 years ago

@tliu68 Thank you that is really helpful. Now that it is clear what is exactly under consideration, I am going to send this for a scope check with the editorial board since the lines of code being about 1000 is borderline for size. We'll get back to you within a week or two about the outcome. Thanks!

kthyng commented 3 years ago

@whedon query scope

whedon commented 3 years ago

Submission flagged for editorial review.

danielskatz commented 3 years ago

👋 @tliu68 - I'm sorry to say that after discussion amongst the JOSS editors, we have decided that this submission does not meet the substantial scholarly effort criterion for review by JOSS. Please see https://joss.readthedocs.io/en/latest/submitting.html#other-venues-for-reviewing-and-publishing-software-packages for other suggestions for how you might receive credit for your work.

danielskatz commented 3 years ago

@whedon reject

whedon commented 3 years ago

Paper rejected.