Open rmj3197 opened 6 months ago
Hi there! Thank you for submitting your package for pyOpenSci review. Below are the basic checks that your package needs to pass to begin our review. If some of these are missing, we will ask you to work on them before the review process begins.
Please check our Python packaging guide for more information on the elements below.
import package
.
The package installation does not install the dependenciesREADME.md
file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.CONTRIBUTING.md
file that details how to install and contribute to the package.CODE_OF_CONDUCT.md
file.YAML
header of the issue (located at the top of the issue template).Nice submission, I'll get started on finding the perfect editor for Quadratik
!
Hey @rmj3197, I am super excited to introduce @isabelizimm as the editor for this submission! Isabel will be your privileged point of contact from now on, though you are welcome to ask me anything during the process. Please note that she will not get started until the week after June 7th.
Happy reviewing!
Hello there! Happy to be ushering this package through 👋 I'm going to go ahead and start looking for reviewers; I'll plan to touch base when I have reviewers lined up OR in 2 weeks (say, June 24), whichever comes first.
Hello @isabelizimm,
Thank you so much for the update and for taking the time to review our package. I look forward to hearing from you soon.
Checking in! I have one reviewer ready (yay!) and have reached out to some possibilities for a second. I'll keep you updated when I know more 👍
Hello @isabelizimm , thank you very much for the update!
Welcome welcome to our fearless reviewers: @acolum and @ab93 👋 Thank you SO MUCH for volunteering to review for pyOpenSci! You are two people with awesome math-y, stats-y, ML-y, Python-y backgrounds, which is perfect for this package, and I am looking forward to learning from you through this review process 🌻
Before beginning your review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.
The following resources will help you complete your review:
Reviewers: @acolum and @ab93 Due date [NOTE: deadline extended]: August 2
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
pyproject.toml
file or elsewhere.Readme file requirements The package meets the readme requirements below:
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 2.5
Overall, this submission was well done and followed most Python package development and documentation best practices. I found no major issues with the package's documentation, usability, and functionality, but I've outlined a few minor issues below.
Potential issues that could be fixed:
Minor issues that need fixing:
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
pyproject.toml
file or elsewhere.Readme file requirements The package meets the readme requirements below:
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 3 hours
Great submission overall. Documentation is good, and I like the user guide as well. There are a few tweaks, suggestions and adjustments that I can add here.
Packaging and CI
poetry install
python = "^3.9, !=3.9.7"
.
This can cause some downstream applications using this package to break if any of its dependencies do not support a new Python version.
So my recommendation will be to only allow Python versions that the package supports, i.e. something like
python = ">=3.9, <3.13"
Black
format badge, but I don't see the CI having the black format check. Adding that would be greatCode Practices, which again can be identified using a linter like Ruff
Exception
class is not the best practice, a more fine-grained exception raising would be great__init__()
function, e.g. in the PKBC
class,
self.dat
only gets initialized in fit()
__slots__
would be nice, as it reduces memory footprintThank you so much to our reviewers @acolum and @ab93 for your thoughts on QuadratiK
!🌷 The next step here is for the author to implement the changes suggested by reviewers. This piece can involve a bit of back and forth, @rmj3197, please let us know in this thread if you have questions about the review. Otherwise, post here when the reviews have been addressed and the reviewers will look over the updates and give their final approval!
.rst file extension instead of a .md file extension.
This is okay! As long as there is a README file there, we are good to go 😄
Thank you @acolum and @ab93 for your valuable suggestions and comments. Thank you @isabelizimm for your help and communication. I will address the changes and update you once they are completed. Thank you all for your time.
Dear @acolum, @ab93, and @isabelizimm,
Thank you very much for your insightful review. We apologize for the delay in our response. We have tried to address the points raised in the review below:
We have added a list of relevant packages in R and Python in the README file.
We have added the repo status badge and organized the various other badges according to the categories specified in the example package.
We have now linked all vignettes in the README file.
We have added a CITATION.cff file in the repository. Additionally, we have also included the BibTex entry in the README.
This was clarified by @isabelizimm that a .rst file is fine.
We have now updated the development guide with commands and instructions on using Poetry. Please see the README file for the updated guide.
This has been updated in the pyproject.toml file with [python = ">=3.9, !=3.9.7, <3.13"]
.
The black format CI check is included now. The github action can be found at - https://github.com/rmj3197/QuadratiK/actions/workflows/black_check.yml.
We have now implemented the Ruff linting. The github action can be found at - https://github.com/rmj3197/QuadratiK/actions/workflows/ruff_linting.yml.
We have now included Python typing. The hints are now being shown in our updated documentation (https://quadratik.readthedocs.io/en/latest/api_reference/index.html).
All instance variables are now first defined in the __init__()
method.
__slots__
would be nice, as it reduces memory footprint.__slots__
are now included. An example is - https://github.com/rmj3197/QuadratiK/blob/master/QuadratiK/kernel_test/_kernel_test.py
The base Exception class is not raised anymore. We now raise relevant errors.
Thank you for the changes, @rmj3197. It looks good now. I have approved it!
Hi all--apologies for the late response! @acolum, are you able to check if the changes made addressed the comments in your review? If so, please check the box in your review that states The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
✔️ and let us know you have approved it!
Thanks for the reminder, @isabelizimm! I've checked the box and approved the package.
Thank you @ab93 and @acolum for your feedback in improving the package and approving the changes.
@isabelizimm, thanks for facilitating the process. Please let us know what are the next steps! Thank you very much for your time.
Submitting Author: Raktim Mukhopadhyay (@rmj3197) All current maintainers: @giovsaraceno Package Name: QuadratiK One-Line Description of Package: QuadratiK includes test for multivariate normality, test for uniformity on the sphere, non-parametric two- and k-sample tests, random generation of points from the Poisson kernel-based density and clustering algorithm for spherical data. Repository Link: https://github.com/rmj3197/QuadratiK Version submitted: 1.1.0 EIC: @Batalex Editor: @isabelizimm Reviewer 1: @acolum Reviewer 2: @ab93 Archive: TBD JOSS DOI: TBD Version accepted: TBD Date accepted (month/day/year): TBD
Code of Conduct & Commitment to Maintain Package
Description
We introduce the
QuadratiK
package that incorporates innovative data analysis methodologies. The presented software, implemented in bothR
andPython
, offers a comprehensive set of novel goodness-of-fit tests and clustering techniques using kernel-based quadratic distances. Our software implements one, two and k-sample tests for goodness of fit, providing an efficient and mathematically sound way to assess the fit of probability distributions. Expanded capabilities of our software include supporting tests for uniformity on the $d$-dimensional Sphere based on Poisson kernel densities, and algorithms for generating random samples from Poisson kernel densities. Particularly noteworthy is the incorporation of a unique clustering algorithm specifically tailored for spherical data that leverages a mixture of Poisson kernel-based densities on the sphere. Alongside this, our software includes additional graphical functions, aiding the users in validating, as well as visualizing and representing clustering results. This enhances interpretability and usability of the analysis. In summary, ourR
andPython
packages serve as a powerful suite of tools, offering researchers and practitioners the means to delve deeper into their data, draw robust inference, and conduct potentially impactful analyses and inference across a wide array of disciplines.Scope
Please indicate which category or categories. Check out our package scope page to learn more about our scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
Domain Specific
Community Partnerships
If your package is associated with an existing community please check below:
For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
Who is the target audience and what are scientific applications of this package?
The QuadratiK package offers robust tools for goodness-of-fit testing, a fundamental aspect in statistical analysis, where accurately assessing the fit of probability distributions is essential. This is especially critical in research domains where model accuracy has direct implications on conclusions and further research directions.
Are there other Python packages that accomplish the same thing? If so, how does yours differ?
SciPy
andhyppo
also have collections of goodness-of-fit test functionalities. Our interest focuses on tests that are based on the family of kernel-based quadratic distances. The kernels we use are diffusion kernels, that is, probability distributions that depend on a tuning parameter and satisfy the convolution property. We also implement the Poisson kernel-based tests for uniformity on the d-dimensional sphere.We are aware of only a limited number of
Python
libraries that offer spherical clustering capabilities, such asspherecluster
(last updated in November 2018) andsoyclustering
(last updated in May 2020).spherecluster
implements Spherical K-Means and clustering using von Mises Fisher distributions as proposed in "Banerjee, Arindam, et al. "Clustering on the Unit Hypersphere using von Mises-Fisher Distributions." Journal of Machine Learning Research 6.9 (2005).".soyclustering
implements spherical k-means for document clustering which has been proposed in Kim, Hyunjoong, Han Kyul Kim, and Sungzoon Cho. "Improving spherical k-means for document clustering: Fast initialization, sparse centroid projection, and efficient cluster labeling." Expert Systems with Applications 150 (2020): 113288.In summary, there are fundamental differences between QuadratiK and existing packages that are as follows -
The GOF tests are U-statistics based on centered kernels. The concept and methodology of centering is unique to our methods and is not part of the methods appearing in existing packages.
An algorithm for connecting the tuning parameter with the statistical properties of the test, namely power and degrees of freedom (DOF) is provided. This feature differentiates our novel methods from methods in other packages.
A new clustering algorithm for data that reside on the sphere using the Poisson kernel-based densities is offered. This aspect is not a feature of the existing packages.
We also offer algorithms for generating random samples from Poisson kernel-based densities. This capability is also unique to our package.
We also implement a GUI to enable interaction with the software in a non-programmatic manner using the
streamlit
library. We have not found any python package that implements a GUI for the above described tasks.If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted: Please see our pre-submission enquiry for this submission at - https://github.com/pyOpenSci/software-submission/issues/168Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication Options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Confirm each of the following by checking the box.
Please fill out our survey
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
The editor template can be found here.
The review template can be found here.