openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: Fast k-medoids Clustering in Rust and Python #4183

Closed whedon closed 2 years ago

whedon commented 2 years ago

Submitting author: !--author-handle-->@kno10<!--end-author-handle-- (Erich Schubert) Repository: https://github.com/kno10/python-kmedoids.git Branch with paper.md (empty if default branch): JOSS-draft Version: v0.3.3 Editor: !--editor-->@mikldk<!--end-editor-- Reviewers: @timClicks, @TahiriNadia Archive: 10.5281/zenodo.6802320

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/4d6eb4cebdad66a96ebf153f986376f3"><img src="https://joss.theoj.org/papers/4d6eb4cebdad66a96ebf153f986376f3/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/4d6eb4cebdad66a96ebf153f986376f3/status.svg)](https://joss.theoj.org/papers/4d6eb4cebdad66a96ebf153f986376f3)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@timClicks & @TahiriNadia, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mikldk know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Review checklist for @timClicks

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @TahiriNadia

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 2 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @timClicks, @TahiriNadia it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 2 years ago
Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.06 s (186.4 files/s, 18247.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                           2             85            206            211
Rust                             1              9             77            174
Markdown                         2             29              0             64
reStructuredText                 1             48             34             57
TOML                             2              4              0             40
YAML                             2              4              1             29
make                             1              1              0              4
-------------------------------------------------------------------------------
SUM:                            11            180            318            579
-------------------------------------------------------------------------------

Statistical information for the repository 'c55cad42f8224e43fd62b6df' was
gathered on 2022/02/19.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Erich Schubert                   8           596             94          100.00

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Erich Schubert              502           84.2          1.4                8.57
whedon commented 2 years ago

PDF failed to compile for issue #4183 with the following error:

 Can't find any papers to compile :-(
mikldk commented 2 years ago

@whedon generate pdf from branch JOSS-draft

whedon commented 2 years ago
Attempting PDF compilation from custom branch JOSS-draft. Reticulating splines etc...
whedon commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

mikldk commented 2 years ago

@whedon check references from branch JOSS-draft

whedon commented 2 years ago
Attempting to check references... from custom branch JOSS-draft
whedon commented 2 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1002/9780470316801.ch2 is OK
- 10.1137/0137041 is OK
- 10.1007/978-3-030-32047-8_16 is OK
- 10.1016/j.is.2021.101804 is OK
- 10.21105/joss.01230 is OK
- 10.1287/opre.16.5.955 is OK
- 10.1111/j.1538-4632.1979.tb00674.x is OK
- 10.1007/s10852-005-9022-1 is OK
- 10.1147/sj.22.0129 is OK
- 10.1016/j.eswa.2008.01.039 is OK
- 10.1093/bioinformatics/btp163 is OK

MISSING DOIs

- None

INVALID DOIs

- None
mikldk commented 2 years ago

@timClicks, @TahiriNadia: Thanks for agreeing to review.

Please note that both the repos should be reviewed (https://github.com/openjournals/joss-reviews/issues/4096#issuecomment-1020333611):

Please carry out your review in this issue by updating the checklist above and giving feedback in this issue. The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. If possible create issues (and cross-reference) in the submission's repository to avoid too specific discussions in this review thread.

If you have any questions or concerns please let me know.

@kno10: As you see both repos will be reviewed. This also means that when the repos are ready, you will have to prepare a combined archive (e.g. zenodo.org) of both repositories.

mikldk commented 2 years ago

@timClicks, @TahiriNadia, can you please give a brief status of your review? This is not to rush you, merely to give me an impression of the progress and time-frame.

mikldk commented 2 years ago

@timClicks, @TahiriNadia, can you please give a brief status of your review? This is not to rush you, merely to give me an impression of the progress and time-frame.

mikldk commented 2 years ago

@timClicks, @TahiriNadia, can you please give a brief status of your review? This is not to rush you, merely to give me an impression of the progress and time-frame.

mikldk commented 2 years ago

@timClicks, @TahiriNadia, can you please give a brief status of your review? This is not to rush you, merely to give me an impression of the progress and time-frame.

timClicks commented 2 years ago

I will try to have my review completed this week. Sorry for the lack of communication!

TahiriNadia commented 2 years ago

I will definitely give a brief overview of my review. Sorry too, this is my first time writing a review on JOSS and I don't know how it works.

mikldk commented 2 years ago

I will definitely give a brief overview of my review. Sorry too, this is my first time writing a review on JOSS and I don't know how it works.

Thanks. No worries. Let me know if there is anything I can help you with.

TahiriNadia commented 2 years ago

@whedon generate pdf

editorialbot commented 2 years ago

My name is now @editorialbot

TahiriNadia commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:warning: An error happened when generating the pdf. Paper file not found.

TahiriNadia commented 2 years ago

@editorialbot check references from branch JOSS-draft

editorialbot commented 2 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1002/9780470316801.ch2 is OK
- 10.1137/0137041 is OK
- 10.1007/978-3-030-32047-8_16 is OK
- 10.1016/j.is.2021.101804 is OK
- 10.21105/joss.01230 is OK
- 10.1287/opre.16.5.955 is OK
- 10.1111/j.1538-4632.1979.tb00674.x is OK
- 10.1007/s10852-005-9022-1 is OK
- 10.1147/sj.22.0129 is OK
- 10.1016/j.eswa.2008.01.039 is OK
- 10.1093/bioinformatics/btp163 is OK

MISSING DOIs

- None

INVALID DOIs

- None
mikldk commented 2 years ago

@editorialbot set JOSS-draft as branch

editorialbot commented 2 years ago

Done! branch is now JOSS-draft

mikldk commented 2 years ago

@editorialbot check references

editorialbot commented 2 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1002/9780470316801.ch2 is OK
- 10.1137/0137041 is OK
- 10.1007/978-3-030-32047-8_16 is OK
- 10.1016/j.is.2021.101804 is OK
- 10.21105/joss.01230 is OK
- 10.1287/opre.16.5.955 is OK
- 10.1111/j.1538-4632.1979.tb00674.x is OK
- 10.1007/s10852-005-9022-1 is OK
- 10.1147/sj.22.0129 is OK
- 10.1016/j.eswa.2008.01.039 is OK
- 10.1093/bioinformatics/btp163 is OK

MISSING DOIs

- Errored finding suggestions for Clustering by means of medoids, please try later
- Errored finding suggestions for BanditPAM: Almost Linear Time k-medoids Clustering via Multi-Armed Bandits, please try later

INVALID DOIs

- None
mikldk commented 2 years ago

@timClicks, @TahiriNadia: Can you please give a brief status of your review? This is not to rush you, merely to give me an impression of the progress and time-frame.

kno10 commented 2 years ago

Hi, no updates in a month - any progress? @timClicks @TahiriNadia @mikldk

timClicks commented 2 years ago

Sorry about the lack of contact from me! I will finish the review within 24h

timClicks commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

timClicks commented 2 years ago

Re performance claims - the paper only claims to be faster than the upstream Java implementation. I don't think that the paper should be refused because it hasn't been benchmarked against all k-medoids/PAM implementations. Accordingly, I have marked the item as satisfied in my review.

TahiriNadia commented 2 years ago

I put my comment on this document via PR. My major comment is about the package installation, it gives me a lot of errors. I asked the authors to correct this point by managing a requirements file. However, the idea of the paper is original and very relevant since it compares the best-known classification algorithms (i.e. kmedoids) in different programming languages (i.e. java, python, Rust, C++) and in classical python libraries (i.e. biopython, sklearn_extra, pyclustering). This paper will be of great interest to the scientific community, as the task of classification is ubiquitous in various spheres of society (imaging, marketing, linguistics and others). Speed is often a critical element, especially when performing simulations of large-scale projects. Along with better documentation of how to install, a user guide and finally a requirements file, I highly recommend this article.

timClicks commented 2 years ago

Re "a statement of need" The authors describe high performance + ease of use as their motivations. This satisfies the requirements within the Documentation section of the review, but unfortunately not the Software Paper section. It requires a paragraph entitled "Statement of Need". I have submitted an issue in the Python repository.

Re Community Guidelines I could not find contributor documentation when I looked. I have submitted an issue in the Python repository along with some example wording.

timClicks commented 2 years ago

My major comment is about the package installation, it gives me a lot of errors.

Installation worked flawlessly for me today.

TahiriNadia commented 2 years ago

My major comment is about the package installation, it gives me a lot of errors.

Installation worked flawlessly for me today.

I'm trying it now and I confirm that it's OK for me too.

TahiriNadia commented 2 years ago

I put my comment on this document via PR. My major comment is about the package installation, it gives me a lot of errors. I asked the authors to correct this point by managing a requirements file. However, the idea of the paper is original and very relevant since it compares the best-known classification algorithms (i.e. kmedoids) in different programming languages (i.e. java, python, Rust, C++) and in classical python libraries (i.e. biopython, sklearn_extra, by clustering). This paper will be of great interest to the scientific community, as the task of classification is ubiquitous in various spheres of society (imaging, marketing, linguistics and others). Speed is often a critical element, especially when performing simulations of large-scale projects. Along with better documentation of how to install, a user guide, and finally a requirements file, I highly recommend this article. => FIX by @kno10

mikldk commented 2 years ago

@TahiriNadia: Can you confirm that you have finished the review and recommend that this paper is now published?

@timClicks: Thanks for specifying the remaining items for @kno10

larslenssen commented 2 years ago

Hi @timClicks, we benchmarked the performance against different implementations of kmedoids in different languages:

Screenshot 2022-06-02 134622
TahiriNadia commented 2 years ago

@TahiriNadia: Can you confirm that you have finished the review and recommend that this paper is now published?

@mikldk: Yes, I confirm that everything is finished on my side and I recommend that this paper be published.

larslenssen commented 2 years ago

@mikldk and @timClicks: We have finished the remaining items

kno10 commented 2 years ago

to see the added statement of need: @editorialbot generate pdf

mikldk commented 2 years ago

@kno10 @lale1009: Thanks. I'll wait on @timClicks have the time to check the items.

timClicks commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

timClicks commented 2 years ago

@mikldk I recommend this paper for publication.

@kno10 @lale1009 Please accept my apologies for the repeated delays.

mikldk commented 2 years ago

@timClicks, @TahiriNadia: Thanks for your reviews.

@kno10:

kno10 commented 2 years ago

@mikldk: I have archived a snapshot of the Rust and Python parts version 0.3.3 at Zenodo as: Schubert, Erich, & Lenssen, Lars. (2022). Fast k-medoids Clustering in Rust and Python (0.3.3). Zenodo. https://doi.org/10.5281/zenodo.6802320 The version number is 0.3.3 (some small bug fixes were integrated, and the package is now automatically built for more architectures to make it easier to install). I do not have further changes to the paper itself.

mikldk commented 2 years ago

@editorialbot set v0.3.3 as version