Closed editorialbot closed 1 year ago
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Checking the BibTeX entries failed with the following error:
Failed to parse BibTeX entry: cite-key missing
Software report:
github.com/AlDanial/cloc v 1.88 T=0.13 s (231.1 files/s, 59013.8 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Python 14 1060 1799 3390
CSS 1 83 67 370
Markdown 3 105 0 338
TeX 1 37 0 332
YAML 4 21 21 118
reStructuredText 3 26 36 34
DOS Batch 1 8 1 26
JSON 1 1 0 16
make 1 4 7 9
TOML 1 0 0 6
INI 1 0 0 2
-------------------------------------------------------------------------------
SUM: 31 1345 1931 4641
-------------------------------------------------------------------------------
gitinspector failed to run statistical information for the repository
Wordcount for paper.md
is 1727
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Dear @AP6YC and @jjerphan
This is the review thread. Firstly, type
@editorialbot generate my checklist
to generate your own checklist. In that checklist, there are 23 check items. Whenever you complete the corresponding task, you can check off them.
Please write your comments as separate posts and do not modify your checklist descriptions.
The review process is interactive so you can always interact with the authors, reviewers, and the editor. You can also create issues and pull requests in the target repo. Please do mention this thread's URL in the issues so we can keep tracking what is going on out of our world.
Please do not hesitate to ask me about anything, anytime.
Thank you in advance!
@jjerphan - Thank you for creating your checklist.
@AP6YC - Could you please create your checklist as I described above?
@editorialbot check references
Checking the BibTeX entries failed with the following error:
Failed to parse BibTeX entry: cite-key missing
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
I started reviewing the HiPart in the last few days.
I am drafting https://github.com/panagiotisanagnostou/HiPart/pull/19 to provide explicit suggestions for the repository.
Dear reviewers @AP6YC and @jjerphan
Could you please update us on how is your review going?
Thank you in advance
Hi @jbytecode, I did a first review and need to find some time to get in the details for the last items which are not checked yet (e.g. reproducing results) and provide comprehensive feedback to authors. To me, I am on track.
@jjerphan - I am appreciated of your great effort and spending your valuable time for JOSS. Thank you for your quick response.
You are welcome. Ideally I will finish my review this weekend.
@jjerphan Sorry for the delay; I am prioritizing my review and suggestions to the authors to be complete this week.
I am one of the maintainers of scikit-learn which is providing reference implementation of clustering algorithms. I do not have any conflict of interest in the favor of the acceptance of the rejection of this paper: scikit-learn contributors and maintainers welcome qualitative third party packages extending scikit-learn in the ecosystem.
As of https://github.com/panagiotisanagnostou/HiPart/tree/f6c0d3785741f752a5a2cecafcef1d1eb03347bb, I have not checked the following 7 items of the 21 items mainly because I have questions, remarks or suggestions to improve of them:
Bellow are my questions, remarks or suggestions for each of them.
- Contribution and authorship: Has the submitting author (@panagiotisanagnostou) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
It is clear that @panagiotisanagnostou is the main major contributor to HiPart. @stevestavropoulos also has contributed notable to HiPart.
To validate this item I would like to know the rationale for other authors' presence given that authors are indicated as having contributed equally.
- Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
I think HiPart is rather small as a package with perfectible aspects. To me, HiPart currently do not meet all the criteria for being eligible to be published in JOSS. Yet, I think it is on track and that this submission is appreciable and legitimate.
Perfectible aspects of HiPart are:
bicecting_kmeans
must be renamed BisectingKmeans
Functionality: Have the functional claims of the software been confirmed? Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)
- Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
I unfortunately can't reproduce results (benchmarks and figures) using script in paper/paper_scripts
. Some import paths are invalid and data is missing.
To validate this item, please do check that those scripts work (I think using relative paths might help prevent issues).
Moreover, from my reading of the implementations (which are mainly backed by NumPy), I hardly think implementations can be scalable or efficient for the targeted problems.
I can't say if HiPart is completely functional, is performant, and if claims are backed by reproducible results.
- Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Guidelines for 2) and 3) are clear to me but guidelines for 1) are missing. Adding a simple contributing guide as CONTRIBUTING.md
would suffice to me.
- Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
See https://github.com/panagiotisanagnostou/HiPart/pull/23 for remarks for this items.
@jjerphan Sorry for the delay; I am prioritizing my review and suggestions to the authors to be complete this week.
@AP6YC: I am not sure if you instead intended to mention @jbytecode. To me, it's fine: you still have at least two weeks left. :slightly_smiling_face:
@panagiotisanagnostou - could you please update your status and inform us on how is your edits going?
@jbytecode - Tomorrow I will update you on my edits, they are almost ready. We tried to finilize each one of them before responding, apologies for the delay!
In order to address the comments made by @jjerphan, I will respond by quoting his questions and providing my answers in the section below.
- Contribution and authorship: Has the submitting author (@panagiotisanagnostou) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
It is clear that @panagiotisanagnostou is the main major contributor to HiPart. @stevestavropoulos also has contributed notable to HiPart.
To validate this item I would like to know the rationale for other authors' presence given that authors are indicated as having contributed equally.
The list of authors came after careful consideration of their contribution to the quality of the work done. Apparently the submitting author (@panagiotisanagnostou) is maintaining the GitHub repository but there are a lot of work done prior to uploading the source code. Prof. Sotiris Tasoulis was the original developer, prototyping the divisive clustering concept, so his insights into developing the basic design of the package were invaluable. Dr. Dimitris Tasoulis provided invaluable insight in the debugging process as a developer with industry experience, while also contributing in the preparation of the submitted manuscript. Finally, Prof. Vassilis Plagianakos contributed mainly to the visualization tools of the HiPart package with solutions that allow great expandability. We also highlight that all authors have published articles regarding divisive clustering, and thus contributed through their deep understanding and expertise of the domain.
- Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
I think HiPart is rather small as a package with perfectible aspects. To me, HiPart currently do not meet all the criteria for being eligible to be published in JOSS. Yet, I think it is on track and that this submission is appreciable and legitimate.
Perfectible aspects of HiPart are:
its documentation: for now, there is a small number of undocumented examples. Notebook styles examples for all methods might improve HiPart approachability and adoption.
names containing typos and not PEP8-compliant: as an example, I think
bicecting_kmeans
must be renamedBisectingKmeans
lack comparisons with other libraries in the Scientific Python Ecosystem, in particular proof of functionality for the targeted problems (scalable methods for high dimensional data clustering) with respect to alternatives.
code quality: there's currently a lot of duplication on some files (see the ones graded as F). Due to the size of the library, some more effort can bring it to a A grade on Codacy.
Thank you for legitimizing our submission to the JOSS. Our goal for the following weeks is to add more methods and expand the documentation significantly. In the meantime, according to your suggestion, we have included much more well-presented examples in the documentation.
As far as your concerns on the PEP8 compliance and the quality of code, as you can see in the recent commits, we have addressed most issues.
The algorithm members of the HiPart package are binary space partition tree-based algorithms. Specifically, each split the algorithms conduct can be represented by a node in an unbalanced binary tree. The final clustering of the data in all the five algorithms are the samples contained as data in the leaves of the binary tree.
Moreover, as mentioned in the CONTRIBUTING.md the algorithms have the same three member methods in their implementation. Those method are, one to split the node into two children, one for selecting the nest node to split and one to calculate the data each node will contain. This fact has the unfortunate effect of creating duplication in the code, especially if there is a need to keep the same basic structure in a source code. Nevertheless, we tried to reduce the duplication errors as much as we can.
After extensive research in the Scientific Python Ecosystem, to the best of our research efforts, divisive clustering algorithmic implementations are missing. This fact limits our options for comparisons; Still, we now decided to include further comparisons with more generic clustering toolboxes.
Functionality: Have the functional claims of the software been confirmed? Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)
- Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
I unfortunately can't reproduce results (benchmarks and figures) using script in
paper/paper_scripts
. Some import paths are invalid and data is missing.To validate this item, please do check that those scripts work (I think using relative paths might help prevent issues).
Moreover, from my reading of the implementations (which are mainly backed by NumPy), I hardly think implementations can be scalable or efficient for the targeted problems.
I can't say if HiPart is completely functional, is performant, and if claims are backed by reproducible results.
Thank you for highlighting this problem. We have now included all files needed for reproduction with the paper_results_generation.py and the generate_paper_fig.py scripts. As far as the paper_scripts we uploaded the data in the repository in a compressed form.
We have now clarified the scalability of the included implementations for the target problems. We now clearly mention that the package focuses on Large Scale data that fit in memory and that the key aspect which makes it possible is the pass-by-reference nature of the Python programming language. In addition, we have also indicated that HiPart provide algorithmic options that differ significantly in complexity.
- Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Guidelines for 2) and 3) are clear to me but guidelines for 1) are missing. Adding a simple contributing guide as
CONTRIBUTING.md
would suffice to me.
Thank you for the suggestion we created a CONTRIBUTING.md file. We believe that it contains all the necessary information for someone that wants to contribute to the repository to do so. We hope it is up to community standards.
- Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
See panagiotisanagnostou/HiPart#23 for remarks for this items.
I have added a response in the pull request @jjerphan created.
@AP6YC and @jjerphan - It would be nice to hear from you. Thank you in advance.
I will have a look tomorrow.
As of panagiotisanagnostou/HiPart@efb7859, I have now checked:
I have not checked the following 4 items of the 21 items mainly because I still either have remark or need more time to assess it:
Contribution and authorship
Authorships is now clear. I guess part of this paragraph can be included in the project's README.md
.
Substantial scholarly effort
Thank you for legitimizing our submission to the JOSS. Our goal for the following weeks is to add more methods and expand the documentation significantly. In the meantime, according to your suggestion, we have included much more well-presented examples in the documentation.
As far as your concerns on the PEP8 compliance and the quality of code, as you can see in the recent commits, we have addressed most issues.
Thank you, for mentioning this recent work.
The algorithm members of the HiPart package are binary space partition tree-based algorithms. Specifically, each split the algorithms conduct can be represented by a node in an unbalanced binary tree. The final clustering of the data in all the five algorithms are the samples contained as data in the leaves of the binary tree.
Moreover, as mentioned in the CONTRIBUTING.md the algorithms have the same three member methods in their implementation. Those method are, one to split the node into two children, one for selecting the nest node to split and one to calculate the data each node will contain. This fact has the unfortunate effect of creating duplication in the code, especially if there is a need to keep the same basic structure in a source code. Nevertheless, we tried to reduce the duplication errors as much as we can.
Thank you, I need more time to validate those aspects.
After extensive research in the Scientific Python Ecosystem, to the best of our research efforts, divisive clustering algorithmic implementations are missing. This fact limits our options for comparisons; Still, we now decided to include further comparisons with more generic clustering toolboxes.
Thank you, I need more time to validate this.
Functionality, Performance and Reproducibility
I still can't run the scripts:
→ python paper_scripts/generate_paper_fig.py
Traceback (most recent call last):
File "/home/jjerphan/dev/HiPart/paper_scripts/generate_paper_fig.py", line 29, in <module>
import __utilities as util
ModuleNotFoundError: No module named '__utilities'
~/dev/HiPart HiPart ± paper
→ python paper_scripts/generate_paper_fig.py
~/dev/HiPart HiPart ± paper
→ python paper_scripts/paper_results_generation.py
Traceback (most recent call last):
File "/home/jjerphan/dev/HiPart/paper_scripts/paper_results_generation.py", line 26, in <module>
import __utilities as util
ModuleNotFoundError: No module named '__utilities'
You need to fix the imports.
Community guidelines
Thank you for the suggestion we created a CONTRIBUTING.md file. We believe that it contains all the necessary information for someone that wants to contribute to the repository to do so. We hope it is up to community standards.
Thank you, this now qualifies to me.
Quality of writing
I have added a response in the pull request @jjerphan created.
Thank you, this now qualifies to me.
Edit:
@panagiotisanagnostou, I think providing working scripts would allow me to better access the remaining items. Do you think you can provide such scripts?
@jbytecode: are there any strict deadlines?
Functionality, Performance and Reproducibility
I still can't run the scripts:
→ python paper_scripts/generate_paper_fig.py Traceback (most recent call last): File "/home/jjerphan/dev/HiPart/paper_scripts/generate_paper_fig.py", line 29, in <module> import __utilities as util ModuleNotFoundError: No module named '__utilities' ~/dev/HiPart HiPart ± paper → python paper_scripts/generate_paper_fig.py ~/dev/HiPart HiPart ± paper → python paper_scripts/paper_results_generation.py Traceback (most recent call last): File "/home/jjerphan/dev/HiPart/paper_scripts/paper_results_generation.py", line 26, in <module> import __utilities as util ModuleNotFoundError: No module named '__utilities'
You need to fix the imports.
Thanks for bringing this up. It seems that, the problem you encountered with the scripts is that the working directory for the Python interpreter during execution should be the location of the script. We have updated the code, and this is no longer the case. The scripts are now executable from every directory in your system. In addition, the two required packages are installed automatically upon execution.
The results of the execution can be found in the folder that contains the scripts under the names “dendrogram.pdf” and “paper_results.csv”.
Thank you so much for your time.
@jjerphan - In page (https://joss.readthedocs.io/en/latest/editing.html#overview-of-editorial-process) it is stressed that
We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.
of course, any exceptions and excuses are welcome.
Thanks for bringing this up. It seems that, the problem you encountered with the scripts is that the working directory for the Python interpreter during execution should be the location of the script. We have updated the code, and this is no longer the case. The scripts are now executable from every directory in your system. In addition, the two required packages are installed automatically upon execution.
@panagiotisanagnostou: I am afraid, but I am still meeting the problem using the latest revision of the main
branch (i.e. 0af766c2896070cce7fa54059a0401e308160072) and the latest revision of paper
branch (i.e. 85e1426c9e14f6dc084d3b2ec75dc37dc71472da). Can you provide the commit of the resolution?
@jjerphan, The commit of the resolution can be located at the last commit of the Enhancements (i.e. f26080243e75e8aff0a6e88817a8f8bf180afacb). In this commit, you will see changes in three files. The "__utilities.py", the "generate_paper_fig.py" and the "paper_results_generation.py". The changes are (i) the change in the interpreter working directory, (ii) the try-except block for automatic installation of the "fuzzy-c-means" package, and the "h5py" package.
Please provide me with the error report in order to try and find a suitable solution. I really need help seeing what the problem might be.
OK, the problem was that I ran python in isolated mode. Disabling this mode allows importing from __utilities.py
. Generally, I think it is better to use relative import à la:
from .local_submodule import an_interface
as it allows supporting this mode.
What do you think?
The main problem in this mode is the following, as stated in isolated mode:
In isolated mode sys.path contains neither the script's directory nor the user's site-packages directory. All PYTHON* environment variables are ignored, too.
This resulted in the sys.path variable to contain the following paths:
Unfortunately, according to the above paths, there is no way to access the __utilities.py
script in a relative way. The solution I used for this issue is to manually add the path to the script I execute to the sys.path
variable, with the help of the __file__
dunder variable, and this solves the problem of the imports.
I hope that this is a satisfactory solution.
@jjerphan - Does that modification solve the problem encountered here?
There's no problem if the script is executed without the isolated
mode.
Hi, I need some time to assess the "Substantial scholarly effort" item.
To me, the given elements for substantial scholarly effort are met:
Hence I am validating the last "Substantial scholarly effort" item. I have not time to provide room of improvements but think previous indications have helped already.
Thank you all for your patience.
@jjerphan - Thank you for finalizing your review.
@AP6YC - Could you please update your status? Thank you in advance.
@jjerphan We appreciate the time and effort you put into completing this review. Thank you!
@AP6YC - Could you please update your status and inform us on how is your review going? Thank you in advance.
@AP6YC - It would be great to hear from you. Could you please update your status and send us a life signal? Thank you in advance. I am so sorry if I am bothering.
@jbytecode I would like to include some new features in the package that are not documented in the paper I submitted. Can I do it while the package is under review? Thank you in advance for your time.
@panagiotisanagnostou - We have almost finished the review. It would be better to save your changes in a seperate branch and merge them after we have all of the stuff done. Thank you in advance.
@panagiotisanagnostou Sorry for the significant delay and oversight on my end; I am finishing up my review now after @jbytecode reached out to me directly.
@jbytecode and @AP6YC Thank you very much! I already have a branch with some changes!
@editorialbot set <DOI here> as archive
@editorialbot set <version here> as version
@editorialbot generate pdf
@editorialbot check references
and ask author(s) to update as needed@editorialbot recommend-accept
@editorialbot check references
Checking the BibTeX entries failed with the following error:
Failed to parse BibTeX entry: cite-key missing
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Submitting author: !--author-handle-->@panagiotisanagnostou<!--end-author-handle-- (Panagiotis Anagnostou) Repository: https://github.com/panagiotisanagnostou/HiPart Branch with paper.md (empty if default branch): paper Version: v0.3.1 Editor: !--editor-->@jbytecode<!--end-editor-- Reviewers: @AP6YC, @jjerphan Archive: 10.5281/zenodo.7814113
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@AP6YC & @jjerphan, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jbytecode know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Checklists
📝 Checklist for @jjerphan
📝 Checklist for @AP6YC