Closed editorialbot closed 1 year ago
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.88 T=0.22 s (149.4 files/s, 73588.6 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
SVG 3 0 63 10499
Python 4 351 441 1277
Markdown 7 225 0 702
TeX 1 13 0 200
Jupyter Notebook 5 0 1930 179
YAML 8 21 0 173
TOML 1 7 0 49
CSS 1 15 0 48
DOS Batch 1 8 1 26
make 1 4 7 9
reStructuredText 1 4 4 2
-------------------------------------------------------------------------------
SUM: 33 648 2446 13164
-------------------------------------------------------------------------------
gitinspector failed to run statistical information for the repository
Wordcount for paper.md
is 1361
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10/ggj45f is OK
- 10.1016/j.habitatint.2022.102641 is OK
- 10.1038/s41597-022-01640-8 is OK
- 10.1109/MCSE.2007.55 is OK
- 10.1158/1538-7445.AM2022-5038 is OK
- 10.1016/j.dib.2022.108335 is OK
- 10/ghh97z is OK
- 10.1007/BF02915278 is OK
- 10.1016/j.compenvurbsys.2022.101802 is OK
- 10.1115/IPC2022-87145 is OK
- 10.1016/j.jag.2022.102911 is OK
- 10.5281/zenodo.3960218 is OK
- 10.1007/s12061-022-09490-y is OK
MISSING DOIs
- 10.3390/info11040193 may be a valid DOI for title: Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence
INVALID DOIs
- None
👋🏼 @martinfleis, @csadorf, @gagolews - this is the review thread for the submission. All of our communications will happen here from now on.
Please check the post at the top of the issue for instructions on how to generate your own review checklist. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.
The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues directly in the software repository. If you do so, please mention this thread so that a link is created (and I can keep an eye on what is happening). Please also feel free to comment and ask questions in this thread. It is often easier to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.
Please feel free to ping me (@csoneson) if you have any questions or concerns. Thanks!
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
I'll complete the review in 2-3 weeks.
[x] Repository: Is the source code for this software available at the https://github.com/martinfleis/clustergram?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@martinfleis) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[ ] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
I have some doubts about this, see below...
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.
Please refer to my remarks below.
An interesting contribution, but with a "but".
Clustergrams are rather simple – merely a dozen or so lines of plotting code https://github.com/martinfleis/clustergram/blob/main/clustergram/clustergram.py – therefore the scope of the package is quite limited... I am unsure if it is not a "missed opportunity" for something more substantial.
I wonder if the Python community would benefit more from the current author's contribution to the seaborn package, for example?
Or adding more diagnostic tools for clustering so that its scope is not limited to a single visualisation type?
It seems that there is an implementation of the clustergrams in the EcotoneFinder
package for R, see <https://cran.r-project.org/web/packages/EcotoneFinder/index.html >, this probably should be mentioned as well. – in the form of a single function within a much larger package.
According to this journal's scope and submission requirements, I read that "Minor 'utility' packages, including 'thin' API clients, and single-function packages are not acceptable." – I am not completely convinced that this is exactly the case here...
Other remarks:
p. 1: "research often tries to..." - a personification.
p. 1: "the algorithm itself will not determine the optimal number of clusters" - internal cluster validity measures aim to do that (discuss briefly), but also see my critique in https://doi.org/10.1016/j.ins.2021.10.004.
p. 1: "silhouette analysis" – critique in the aforementioned paper.
p. 1: "the number of seed locations needs to be defied by a researcher and is usually unknown" – I am afraid it does not work precisely that way.
"The" k-means algorithm is a heuristic to find a (local) minimum of a specific objective function (the within-cluster sum of squares, WCSS). By restarting it from multiple initial solutions, we increase the likelihood of pinpointing the true (global) minimum (although no guarantee exists). This is why we usually set the number of restarts to 10–100 and choose the one that corresponds to the smallest WCSS.
I recommend that the "seed selection" problem not be discussed here.
p. 1: "In figures Figure 1 and Figure 2" (same on p. 2)
p. 2: Figure 1 - fractional number of clusters on the x-axis label looks odd.
p. 2: "mean of means" – componentwise arithmetic mean or centroids
p. 2: "that does not necessarily provide the best overview of the behavior" – ??
p. 2: "there is another option weighted the means" – ??
p. 3: Silhouette score, C-H index, DB-index – how about other metrics?
p. 4: "Since the first release of clustergram the package was used" – is such a self-promotion necessary in a research paper?
Thank you for the comments @gagolews!
I am fully aware that the size of the submission is borderline and I was assuming it will undergo a scope query prior actual review. I am copying below the note I made in the submission regarding this.
It is not exceeding the scope limits by a lot but based on my experience, this should pass the possible query-scope process if you decide it needs to go through it. It is being used in papers as is even by people I've never heard before if fields far from mine so I believe that also confirms the scientific benefits (all cited in the paper).
So I am happy if @csoneson requests a scope check from the editorial team here.
As said above, I believe this is borderline on the side of large enough, but I may be wrong.
Clustergrams are rather simple – merely a dozen or so lines of plotting code https://github.com/martinfleis/clustergram/blob/main/clustergram/clustergram.py – therefore the scope of the package is quite limited
I believe that the whole package should be considered when assessing the scope, not only the plotting part. It is not that straightforward to create the data that needs to be plotted from various clustering engines. I think that this processing part is equally valuable as the plotting segments (either static or interactive).
I wonder if the Python community would benefit more from the current author's contribution to the seaborn package, for example?
You'll probably agree that seaborn is not the ideal candidate. I'd be happy to contribute the code elsewhere but I haven't found a good home for it. Hence it lives alone. If someone has good suggestions, I am all ears. It would certainly be better than maintaining this as a small package on its own.
I am unsure if it is not a "missed opportunity" for something more substantial. ... Or adding more diagnostic tools for clustering so that its scope is not limited to a single visualisation type?
As much as I'd love to, my time I can spend on this is fairly limited, so if the current size of the package does not fulfil the scope limits, it will just not be published.
It seems that there is an implementation of the clustergrams in the EcotoneFinder
Thanks, I wasn't aware of it. It is a bit tough to assess it though, I haven't found any documentation or repository apart from what is on CRAN.
is such a self-promotion necessary in a research paper?
One of the points on evaluation of substantive scholarly effort is Whether the software has already been cited in academic papers.,. This shows an evidence for that point.
@martinfleis I'm not going to be too stubborn, the package itself is quite nice. Let's see what others say about it...
I am sorry, but I was a bit swamped the past two weeks, I plan on doing the review mid next week.
👋🏻 @csadorf - just wanted to check whether you had time to take a first look at this submission. Thanks!
Ping @csadorf
👋🏻 @csadorf - just wanted to check whether you had time to take a first look at this submission. Thanks!
@csoneson Very sorry about the delay, I'll have a look at it this week-end.
@csadorf - no worries, thank you!
👋🏻 @csadorf - could you provide an update on the progress of your review? Thanks!
The paper is very well written and illustrated, and the references are almost complete. The code is professionally developed, very well documented, tested, and distributed. I think the paper reaches the level of scholarly effort required for a publication in JOSS, however – as acknowledged by the author themself – just barely, because the majority of the code are thin wrappers around packages that readily provide clustering algorithms (notably sklearn, scipy, and RAPIDS cuML) whereas the core function of the package is to generate plots (the clustergrams), which is achieved in a handful of functions and very few lines of code. It is the substantial amount of testing and documentation that is really providing the net value of this package.
The author claims to embrace scikit-learn’s API style, however that is unfortunately not entirely achieved in my view. While the main class does provide a estimator-like interface, accepting hyper parameters as constructor arguments and offering a fit()
function, other parts of the scikit-learn API guide are ignored, such as avoiding the modification of hyper parameters within the constructor, storing fitted parameters with an underscore suffix, and allowing for easy composability. Other plotting-classes within the scikit-learn package typically implement “from_estimator” and “from_predictions” class methods (e.g. ConfusionMatrixDisplay, PredictionErrorDisplay, etc.). In this way it is easy to use the class with any class that would provide clustering data. The current implementation of the class provides from_data()
and from_centers()
methods, which achieve similar, but not directly compatible behavior.
Since the author is explicitly referencing scikit-learn’s API style, I would suggest to cite Buitinck et al. (arXiv:1309.0238 [cs.LG]) as recommended by the scikit-learn's citation guide.
As a bit of a minor point, however since plotting is the core function of this package I think it warrants mentioning, I would recommend that the ticks on the x-axis on all clustergram diagrams are ensured to be natural numbers since there are no fractions of numbers of clusters.
There are no instructions on how to contribute to the package.
I would recommend that the author adopt some of the recommendations on making the class more modular and independent, consistently follow the scikit-learn API guidelines, and then to consider pursuing a contribution to the scikit-learn package.
I recommend the paper for acceptance with minor revisions, because despite the scholarly effort being marginal, I believe it provides benefit to the community and would simplify reference in future publications.
The author claims to embrace scikit-learn’s API style, however that is unfortunately not entirely achieved in my view.
Thanks for the comment! I'll have a deeper dive to see if the API can follow the style more closely. I suppose the issue will be in the difference between sklearn's estimators producing clustering labels for a fixed k
while clustergram requires a range of results but let me see what can be done here.
is explicitly referencing scikit-learn’s API style, I would suggest to cite Buitinck et al.
Will do, thanks!
I would recommend that the ticks on the x-axis on all clustergram diagrams are ensured to be natural numbers
Good point. Will fix.
There are no instructions on how to contribute to the package.
Contributing guide is available in the documentation: https://clustergram.readthedocs.io/en/stable/contributing.html I can eventually copy its content to CONTRIBNUTING.md and store it at the root of the repo but I didn't think the duplication is necessary here.
Thanks you for your comment!
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
I have finished a review of the package and the paper based on the comments by @csadorf and @gagolews above. Below are responses to points that have not been discussed in the individual comments in above.
From the @gagolews's review (plus see https://github.com/openjournals/joss-reviews/issues/5240#issuecomment-1490353259):
It seems that there is an implementation of the clustergrams in the EcotoneFinder package for R, see <https://cran.r-project.org/web/packages/EcotoneFinder/index.html >, this probably should be mentioned as well.
EcotoneFinder is now mentioned in the paper, I wasn't previously aware of the implementation.
Other remarks: [...]
All the comments related to the paper have been resolved in https://github.com/martinfleis/clustergram/commit/52676ab7966f0d3a183e48b1a93af79d6ea2e55b
From the @csadorf's review:
The author claims to embrace scikit-learn’s API style, however that is unfortunately not entirely achieved in my view. While the main class does provide a estimator-like interface, accepting hyper parameters as constructor arguments and offering a fit() function, other parts of the scikit-learn API guide are ignored, such as avoiding the modification of hyper parameters within the constructor, storing fitted parameters with an underscore suffix, and allowing for easy composability. Other plotting-classes within the scikit-learn package typically implement “from_estimator” and “from_predictions” class methods (e.g. ConfusionMatrixDisplay, PredictionErrorDisplay, etc.). In this way it is easy to use the class with any class that would provide clustering data. The current implementation of the class provides from_data() and from_centers() methods, which achieve similar, but not directly compatible behavior.
The API have been adjusted in https://github.com/martinfleis/clustergram/pull/59. See the PR for the details.
Since the author is explicitly referencing scikit-learn’s API style, I would suggest to cite Buitinck et al. (arXiv:1309.0238 [cs.LG]) as recommended by the scikit-learn's citation guide.
Included now.
As a bit of a minor point, however since plotting is the core function of this package I think it warrants mentioning, I would recommend that the ticks on the x-axis on all clustergram diagrams are ensured to be natural numbers since there are no fractions of numbers of clusters.
Both plotting methods (matplotlib, bokeh), are now limited to integer ticks only.
There are no instructions on how to contribute to the package.
As mentioned in https://github.com/openjournals/joss-reviews/issues/5240#issuecomment-1587065252, the contributing guide was part of the documentation. It is now also available as CONTRIBUTING.md
in the root of the repo.
@csoneson I believe that all comments are now either resolved or discussed in my earlier replies.
I deem the revised version acceptable. Well done.
Thank you @gagolews!
I agree, the revisions look good to me. 👍🏻
Thank you @csadorf and @gagolews!
@csoneson I have released the updated version as v0.8.0
. Its zenodo DOI is 10.5281/zenodo.8202396
.
Thank you @csadorf - could you please also tick the remaining boxes in the checklist above, for completeness?
@martinfleis - I will also have a quick look through the submission, and get back to you with the next steps.
@editorialbot generate pdf
@editorialbot check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10/ggj45f is OK
- 10.1016/j.habitatint.2022.102641 is OK
- 10.1038/s41597-022-01640-8 is OK
- 10.1109/MCSE.2007.55 is OK
- 10.1158/1538-7445.AM2022-5038 is OK
- 10.1016/j.dib.2022.108335 is OK
- 10/ghh97z is OK
- 10.1007/BF02915278 is OK
- 10.1016/j.compenvurbsys.2022.101802 is OK
- 10.1115/IPC2022-87145 is OK
- 10.1016/j.jag.2022.102911 is OK
- 10.5281/zenodo.3960218 is OK
- 10.1007/s12061-022-09490-y is OK
MISSING DOIs
- 10.3390/info11040193 may be a valid DOI for title: Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence
INVALID DOIs
- https://doi.org/10.1016/j.ins.2021.10.004 is INVALID because of 'https://doi.org/' prefix
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@editorialbot check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10/ggj45f is OK
- 10.1016/j.habitatint.2022.102641 is OK
- 10.1038/s41597-022-01640-8 is OK
- 10.1109/MCSE.2007.55 is OK
- 10.1158/1538-7445.AM2022-5038 is OK
- 10.1016/j.dib.2022.108335 is OK
- 10/ghh97z is OK
- 10.1007/BF02915278 is OK
- 10.1016/j.compenvurbsys.2022.101802 is OK
- 10.1115/IPC2022-87145 is OK
- 10.1016/j.jag.2022.102911 is OK
- 10.3390/info11040193 is OK
- 10.5281/zenodo.3960218 is OK
- 10.1007/s12061-022-09490-y is OK
- 10.1016/j.ins.2021.10.004 is OK
MISSING DOIs
- None
INVALID DOIs
- None
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
👋🏻 @csadorf - as I noticed you reacted with 👍🏻 to the message below; could you also tick the remaining boxes in https://github.com/openjournals/joss-reviews/issues/5240#issuecomment-1559574286, to make sure everything is documented? Of course, let us know if you need more time! Thanks 🙂
Thank you @csadorf - could you please also tick the remaining boxes in the checklist above, for completeness?
Ping @csadorf
👋🏻 @csadorf - as I noticed you reacted with 👍🏻 to the message below; could you also tick the remaining boxes in https://github.com/openjournals/joss-reviews/issues/5240#issuecomment-1559574286, to make sure everything is documented? Of course, let us know if you need more time! Thanks 🙂
@csoneson I've checked all the boxes now. All good from my side.
Thanks @csadorf!
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@editorialbot check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10/ggj45f is OK
- 10.1016/j.habitatint.2022.102641 is OK
- 10.1038/s41597-022-01640-8 is OK
- 10.1109/MCSE.2007.55 is OK
- 10.1158/1538-7445.AM2022-5038 is OK
- 10.1016/j.dib.2022.108335 is OK
- 10/ghh97z is OK
- 10.1007/BF02915278 is OK
- 10.1016/j.compenvurbsys.2022.101802 is OK
- 10.1115/IPC2022-87145 is OK
- 10.1016/j.jag.2022.102911 is OK
- 10.3390/info11040193 is OK
- 10.5281/zenodo.3960218 is OK
- 10.1007/s12061-022-09490-y is OK
- 10.1016/j.ins.2021.10.004 is OK
MISSING DOIs
- None
INVALID DOIs
- None
@editorialbot set <DOI here> as archive
@editorialbot set <version here> as version
@editorialbot generate pdf
@editorialbot check references
and ask author(s) to update as needed@editorialbot recommend-accept
@editorialbot set v0.8.0 as version
Done! version is now v0.8.0
@martinfleis I have gone through the paper and it looks good - one reference (Pedregosa et al) is missing a DOI but I don't think it actually has one. Before I recommend acceptance, could you please update the title and author list of the Zenodo archive to match those of the paper, and also specify the license in the Zenodo archive?
@csadorf Thanks! I have updated Zenodo as requested. DOI of the latest fixed versions is 10.5281/zenodo.8202396
.
one reference (Pedregosa et al) is missing a DOI but I don't think it actually has one
That was my understanding as well.
All tasks from the author checklist above (checks etc) have also been done.
@editorialbot set 10.5281/zenodo.8202396 as archive
Submitting author: !--author-handle-->@martinfleis<!--end-author-handle-- (Martin Fleischmann) Repository: https://github.com/martinfleis/clustergram Branch with paper.md (empty if default branch): Version: v0.8.0 Editor: !--editor-->@csoneson<!--end-editor-- Reviewers: @csadorf, @gagolews Archive: 10.5281/zenodo.8202396
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@csadorf & @gagolews, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @csoneson know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Checklists
📝 Checklist for @csadorf
📝 Checklist for @gagolews