pyOpenSci / software-submission

Submit your package for review by pyOpenSci here! If you have questions please post them here: https://pyopensci.discourse.group/
89 stars 33 forks source link

SLEPLET: Slepian Scale-Discretised Wavelets in Python (#148) #149

Closed paddyroddy closed 1 week ago

paddyroddy commented 7 months ago

Submitting Author: Patrick J. Roddy (@paddyroddy) All current maintainers: (@paddyroddy) Package Name: SLEPLET One-Line Description of Package: Slepian Scale-Discretised Wavelets in Python Repository Link: https://github.com/astro-informatics/sleplet Version submitted: v1.4.4 Editor: Szymon Moliński (@SimonMolinsky ) Reviewer 1: Shannon Quinn (@magsol ) Reviewer 2: Jakub Tomasz Gnyp (@gnypit ) Archive: 10.5281/zenodo.7268074 JOSS DOI: 10.21105/joss.05221 Version accepted: 1.4.7 Date accepted (month/day/year): 2024/05/21


Code of Conduct & Commitment to Maintain Package

Description

SLEPLET is a Python package for the construction of Slepian wavelets in the spherical and manifold (via meshes) settings. SLEPLET handles any spherical region as well as the general manifold setting. The API is documented and easily extendible, designed in an object-orientated manner. Upon installation, SLEPLET comes with two command line interfaces - sphere and mesh - that allow one to easily generate plots on the sphere and a set of meshes using plotly. Whilst these scripts are the primary intended use, SLEPLET may be used directly to generate the Slepian coefficients in the spherical/manifold setting and use methods to convert these into real space for visualisation or other intended purposes. The construction of the sifting convolution was required to create Slepian wavelets. As a result, there are also many examples of functions on the sphere in harmonic space (rather than Slepian) that were used to demonstrate its effectiveness. SLEPLET has been used in the development of various papers.

Scope

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo

Community Partnerships

If your package is associated with an existing community please check below:

[^1]: Please fill out a pre-submission inquiry before submitting a data visualization package.

Many fields in science and engineering measure data that inherently live on non-Euclidean geometries, such as the sphere. Techniques developed in the Euclidean setting must be extended to other geometries. Due to recent interest in geometric deep learning, analogues of Euclidean techniques must also handle general manifolds or graphs. Often, data are only observed over partial regions of manifolds, and thus standard whole-manifold techniques may not yield accurate predictions. Slepian wavelets are designed for datasets like these. Slepian wavelets are built upon the eigenfunctions of the Slepian concentration problem of the manifold: a set of bandlimited functions that are maximally concentrated within a given region. Wavelets are constructed through a tiling of the Slepian harmonic line by leveraging the existing scale-discretised framework. Whilst these wavelets were inspired by spherical datasets, like in cosmology, the wavelet construction may be utilised for manifold or graph data.

To the author's knowledge, there is no public software that allows one to compute Slepian wavelets (or a similar approach) on the sphere or general manifolds/meshes. SHTools is a Python code used for spherical harmonic transforms, which allows one to compute the Slepian functions of the spherical polar cap. A series of MATLAB scripts exist in slepian_alpha, which permits the calculation of the Slepian functions on the sphere. However, these scripts are very specialised and hard to generalise.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication Options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Confirm each of the following by checking the box.

Please fill out our survey

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

NickleDave commented 6 months ago

Hi @paddyroddy I'm adding the initial Editor-in-Chief checks.

The tl;dr is that we are about ready to start but I need you to do two things first:

I will add one more comment below but it is not required that you address it before we start the review

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci review. Below are the basic checks that your package needs to pass to begin our review. If some of these are missing, we will ask you to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements below.



Editor comments

NickleDave commented 6 months ago

@paddyroddy related to that to-do item for the docs: my understanding is that the goal for SLEPLET is to provide general access to the wavelet basis/transform you have developed for incomplete data on manifolds (feel free to correct me if I'm wrong)

Part of what we can do with this review is help you reach a broader audience. But that means we need to know where to look for editors and reviewers.

Can you please help me understand the main domains where you have applied and can apply your methods? I see you are an RSE in an astro group and that you mention cosmology in your thesis; in the papers you apply your methods to topographic data from Earth. Are these methods mainly used by people in cosmology, geophysics, computer imaging, all of the above? Could this work relate to geodesy in earth science?

As examples of places where we might look for editors/reviewers: we work with astropy and software underground. I can also imagine asking people that work on manifolds in general and geometric deep learning, and another place we might look would be users/developers of packages you rely on or mention, e.g. pyssht/py2let. But my sense is that we will want domain experts that can help you translate the methods you've developed for a broader audience.

Any more concrete detail you can give me about domains to look at for editors + reviewers would be helpful. We are looking now, but I want to get your input to make sure this review helps you.

paddyroddy commented 6 months ago

The tl;dr is that we are about ready to start but I need you to do two things first:

* [ ]  Add a code of conduct

* [ ]  Add a "layperson" summary of the package to the docs page

  * The first thing someone should read on the docs is a statement of what the package does--I wouldn't assume someone will bother to click again to go to PyPI. Provide enough background that someone outside your domain will have some understanding of how the package fits into a bigger picture, kind of like the first couple of sentences of the abstract of your thesis.

I've added these. Mine isn't particularly lay, but they are a fairly specialised application, so hopefully from the description it's obvious who the target audience are!

paddyroddy commented 6 months ago

Editor comments

* when I first tested that I could import the package, it downloaded a lot of data. Many people might find this surprising. I would suggest adding data directly to the package if possible--if the files are small enough--and/or deferring download of large files

Annoyingly, I hadn't spotted this, as I cache after the first import. I've had a look and the following are created:

-rw-------    1 paddy staff  74M Nov 24 18:37 EGM2008_Topography_flms_L2190.mat
-rw-------    1 paddy staff 5.6K Nov 24 18:37 meshes_laplacians_basis_functions_bird_b697_eigenvalues.npy
-rw-------    1 paddy staff  15M Nov 24 18:37 meshes_laplacians_basis_functions_bird_b697_eigenvectors.npy
-rw-------    1 paddy staff 5.6K Nov 24 18:37 meshes_laplacians_slepian_functions_bird_b697_N194_eigenvalues.npy
-rw-------    1 paddy staff 1.1M Nov 24 18:37 meshes_laplacians_slepian_functions_bird_b697_N194_eigenvectors.npy
-rw-------    1 paddy staff  132 Nov 24 18:37 slepian_masks_africa_L1.npy
-rw-------    1 paddy staff  132 Nov 24 18:37 slepian_masks_south_america_L1.npy

The problem (in particular) is PyPI doesn't allow over 50MB. I need to work out a way to delay the downloading of EGM2008_Topography_flms_L2190.mat.

paddyroddy commented 6 months ago

The problem (in particular) is PyPI doesn't allow over 50MB. I need to work out a way to delay the downloading of EGM2008_Topography_flms_L2190.mat.

Fixed this in astro-informatics/sleplet#336 (v1.4.5), turns out I had introduced the bug using pydantic.Field(default= values. The data will still be downloaded as and when - but not on initial import. And, as before, it is cached so will only happen once.

paddyroddy commented 6 months ago

@paddyroddy related to that to-do item for the docs: my understanding is that the goal for SLEPLET is to provide general access to the wavelet basis/transform you have developed for incomplete data on manifolds (feel free to correct me if I'm wrong)

Yep that's right!

Can you please help me understand the main domains where you have applied and can apply your methods? I see you are an RSE in an astro group and that you mention cosmology in your thesis; in the papers you apply your methods to topographic data from Earth. Are these methods mainly used by people in cosmology, geophysics, computer imaging, all of the above? Could this work relate to geodesy in earth science?

So I'm actually a general RSE (not strictly working on this), but was formerly a PhD student in an astrophysics group, specialising in cosmology. In reality, these methods could be used by any field in which data is measured only in certain parts of a manifold. Be that: oceanography, geophysics, solar physics, astrophysics, cosmology, computer graphics, machine learning etc. In reality, there are many fields which could use methods such as these.

As examples of places where we might look for editors/reviewers: we work with astropy and software underground. I can also imagine asking people that work on manifolds in general and geometric deep learning, and another place we might look would be users/developers of packages you rely on or mention, e.g. pyssht/py2let. But my sense is that we will want domain experts that can help you translate the methods you've developed for a broader audience.

Those all sound like reasonable places. Would be very interested to have others people's insights. Sorry, I'm not sure how else to help on that front!

NickleDave commented 6 months ago

The problem (in particular) is PyPI doesn't allow over 50MB. I need to work out a way to delay the downloading of EGM2008_Topography_flms_L2190.mat.

Sounds like you fixed the issue overall but jic you're still figuring this out: are you familiar with pooch? A pattern a lot of core scientific Python packages seem to be converging on is "put the data in the package and use importlib-resources if it fits on PyPI; for everything else use pooch". See for example scikit-image: https://github.com/scikit-image/scikit-image/issues/5146 or librosa: https://github.com/librosa/librosa/tree/main/librosa/util/example_data

NickleDave commented 6 months ago

Thank you @paddyroddy for your replies about the domain, very helpful.
I am very sympathetic to developing methods that are useful across domains (FFT anyone?) but just want to make sure we're helping you reach that broader audience.

We are looking for an editor + reviewers now.

paddyroddy commented 6 months ago

Sounds like you fixed the issue overall but jic you're still figuring this out: are you familiar with pooch?

Yes, I'm using pooch...

NickleDave commented 6 months ago

Whoops, guess I could've just looked 😝 😇 We can say I'm trying to be helpful

NickleDave commented 5 months ago

Hi @paddyroddy, happy new year's 🙂 I'm happy to report that @SimonMolinsky will be the editor for this review, I'll let him take it from here. We are now actively seeking reviewers.

SimonMolinsky commented 5 months ago

Hi @paddyroddy !

From now on, I'll be responsible for the review process. :) I've looked into your package, the JOSS publication, and related publications, and everything is outstanding. I agree with @NickleDave that your package needs more usability review from potential users than a technical review. I will look for reviewers in specific communities interested in applying your algorithms in their solution.

paddyroddy commented 5 months ago

Thank you, @SimonMolinsky! Looking forward to the next steps

SimonMolinsky commented 4 months ago

@paddyroddy I'm still looking for reviewers. Hopefully, we will start the review at the beginning of next week! (We have one good soul on board, waiting for the response from other people). Thank you for your patience!

SimonMolinsky commented 4 months ago

Hi @paddyroddy , we have our reviewers. We can start a review process.

SimonMolinsky commented 4 months ago

Editor response to review:


Editor comments

:wave: Hi @magsol and @gnypit ! Thank you for volunteering to review for pyOpenSci! I'm here to help you with the review, so if you need any support, then let me know immediately! But first, look at the steps below :point_down: @magsol you may start as soon as you want; @gnypit will join us after 22nd of February!

Please fill out our pre-review survey

Before beginning your review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
  2. Here is the review template that you will need to fill out and submit here as a comment, once your review is complete.

Please get in touch with any questions or concerns! Your review is due: 2024-03-17

Reviewers: Shannon Quinn, Jakub Tomasz Gnyp Due date: 2024-03-17

SimonMolinsky commented 3 months ago

Hi @magsol , @gnypit !

Please let us know how it's going :)

magsol commented 3 months ago

@SimonMolinsky Just finished the reviewer survey, and am starting to work my way through the Reviewer Guide. Hopefully will have some more detailed updates soon :) (sorry for the delay in starting-- tl;dr it's been a very chaotic past month)

magsol commented 3 months ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing:

Somewhere in the 2-3 hours range.


Review Comments

magsol commented 2 months ago

Sorry, it was a very busy past week + weekend. I plan on finishing this review tomorrow. Apologies for the delay!

magsol commented 2 months ago

Ok, I've finished my initial review. I have to admit up front: this is largely outside my area of expertise. While I understand the high-level use case (my research involves a lot of manifold learning and walking, and may occasionally reside in non-Euclidean space), the specifics of Slepian wavelets are not familiar to me.

That said, the package itself seems extremely well put together and maintained. I was able to run the examples and tests without any* problems, and the documentation--while clearly provided for people who understand Slepian wavelets--is well organized.

The only checkbox I couldn't verify was the presence of a repostatus badge, though given that the package itself is already two minor versions more recent than this review version, it's pretty safe to say this package is actively maintained.

* The only caveat here was the CLI example in the README--one of the Zenodo downloads failed mid-stream and crashed the whole example. However, there is already an issue open about this, and it seems like a Zenodo problem, so other than investigating other hosting options for the needed example files, it's easy to say this isn't the fault of the package itself.

paddyroddy commented 2 months ago

Thanks, @magsol! I wasn't aware of repostatus.org, I have added one now in astro-informatics/sleplet#360.

magsol commented 2 months ago

I'm pretty happy with it! I recommend approval :) Thanks @paddyroddy! Great work!

SimonMolinsky commented 2 months ago

@magsol Thank you for your review! It's great to hear that SLEPLET is a well-written package :)

@paddyroddy I heard from @gnypit, and he will review the package soon.

gnypit commented 2 months ago

Dear All,

First and foremost I'd like to apologize for such a late answer. Though unprofessional and thus with no excuse, it has an explanation. I started the review in February with papers at @paddyroddy's Research Gate profile. I've been distracted from March onwards due to a combination of everything that could happen at the same time. I had to change my job, reorganise my wedding party and budget for my moving to London for the next academic year, as part of my family that was supposed to sponsor my postgraduate degree (and therefore paid a very symbolic alimony for multiple years, in spite of come court orders) started backing out on religious grounds, pressuring me to cancel my wedding altogether... It's been a very crazy couple of weeks for me, personally and professionally, taking a toll on my health as well.

Additionally, I wanted to go back to the sleplet package, as the topic itself got me interested greatly. Simon informed me that the functionalities have already been verified, which is great news! Therefore, for me as a mathematician and physicist from different niches, "user experience" and clarity of code were priorities. The overall impression is very good: the code is mainly OOP, with clear organisation of files, their names, and the names of classes, methods and functions. Though I'm no expert in this particular field, I was able to find myself in all the folders and have at least a general idea of what a given file is supposed to be used for. However, I do have some minor remarks when it comes to documentation and comments in the code. I might be wrong, as this repository is dedicated to specialists in a given field - however, from my personal experience many skilled mathematicians, physicists, and others, have very basic programming skills. Therefore guides, documentation https://github.com/astro-informatics/sleplet/blob/main/documentation/DOCUMENTATION.mdand step-by-step examples are in my view an absolute necessity and need expanding. I'm not talking about such elaborate ones, like in the case of Tensorflow, scikit-learn or Transformers, but rather a simple website, such as for PyGAD https://pygad.readthedocs.io/en/latest/ or scipy.stats. Examples could be, at this stage, in the form of Jupyter notebooks, instead of scripts with defined functions which I have found so far. It's my recommendation to create such demonstration notebooks as soon as possible. Moreover, multiple files require additional comments - even snippets in the current documentation practically don't have any. I find it a very common practice, that there are few to no comments in the majority of code and I always suggest adding more. It minimises time required to understand a given file, especially during debugging, when one wants to understand a given part of code without reading the whole file, its dependencies, and so on.

In general, global and local variables are rarely explained - "SNR" in SlepianNoiseAfrica in slepian_noise_africa.py https://github.com/astro-informatics/sleplet/blob/main/src/sleplet/functions/slepian_noise_africa.py is one of few variables (and acronyms!) that was commented on and defined for a layman, such as myself 😅 Also, _vars.py https://github.com/astro-informatics/sleplet/blob/main/src/sleplet/_vars.py in slepian folder is a very short code, but it definitely needs some explanation. Even if there is an accepted practice to use "phi" symbol for a given angle, not everybody has to follow it necessarily. Or, if someone is relatively new to the topic, it makes understanding the code very cumbersome, due to time and attention required to find some context in which this or that variable is used, how it's used and what precisely may it represent. Of course these variables were explained in the current documentation, but it would be more convenient to copy-paste these into the proper script or create a link/reference. Even a comment "Explained in the documentation: >link<" would suffice, come to think of that.

In "tests" codes usually have one short comment for a whole function. Same goes for files like flm.py https://github.com/astro-informatics/sleplet/blob/main/src/sleplet/functions/flm.py, fp.py https://github.com/astro-informatics/sleplet/blob/main/src/sleplet/functions/fp.py and so on - in these cases even the files' names are quite secretive. At the same time files like noise.py https://github.com/astro-informatics/sleplet/blob/main/src/sleplet/noise.py have very comprehensive comments in some places - generally I've noticed that files with "methods", "coefficients" and regarding plotting are the most clearly written and I'd suggest using them as a standard for the rest of the library. Including LaTeX formulas (e.g. in squashed_gaussian.py https://github.com/astro-informatics/sleplet/blob/main/src/sleplet/functions/squashed_gaussian.py, harmonic_gaussian.py https://github.com/astro-informatics/sleplet/blob/main/src/sleplet/functions/harmonic_gaussian.py, etc.) was a great idea and it'd be great to expand on this while creating documentation - or at least giving some references, perhaps 🤔 Of course, some files really are very clear with just two comments, like _bool_methods.py https://github.com/astro-informatics/sleplet/blob/main/src/sleplet/_bool_methods.py, but they constitute a minority of the whole package. Some files have messages in them, such as slepian.py https://github.com/astro-informatics/sleplet/blob/main/src/sleplet/functions/slepian.py, which increases clarity, but specific comments explaining methods like _validate_rank in Slepian class or _compute_angles in SlepianDiracDelta class are in my view preferable.

Finally, some concepts in programming might be very new and difficult to get used to even for semi-advanced programmers in Python, like multiprocessing. I've noticed the _parallel_methods.py https://github.com/astro-informatics/sleplet/blob/main/src/sleplet/_parallel_methods.py using a fork of the multiprocessing library. As there are other libraries and methods to make parallel operations in Python, also depending on the hardware one uses (e.g. CUDA), it might be a good idea to significantly expand examples and explain this part of the library in depth so that if someone uses this part of the library, it is done quite purposefully.

Overall, the biggest challenge before contributors is to create a friendly documentation and embark on a boring and tiresome adventure to add comments. There already are examples of in-depth comments, so there is a clear and present standard to strive for. Apart from this, the library is well-organised, written beautifully in OOP in Python, with some functional parts. I find it intuitive and modern. Also, introducing multiprocessing shows the author's consideration for computing cost, which is a growing necessity in today's scientific world - especially for those who have the skills but cannot afford proper hardware. Should I ever meet someone looking for tools to work with wavelets

I'll make sure with Simon to put it in a proper place as my input to the review.

With kind regards, Jakub T. Gnyp

śr., 20 mar 2024 o 20:29 Simon @.***> napisał(a):

@magsol https://github.com/magsol Thank you for your review! It's great to hear that SLEPLET is a well-written package :)

@paddyroddy https://github.com/paddyroddy I heard from @gnypit https://github.com/gnypit, and he will review the package soon.

— Reply to this email directly, view it on GitHub https://github.com/pyOpenSci/software-submission/issues/149#issuecomment-2010457477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO4C3VRJ6FLI7UB4C5G2IDTYZHPQRAVCNFSM6AAAAAA7NMCAMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJQGQ2TONBXG4 . You are receiving this because you were mentioned.Message ID: @.***>

paddyroddy commented 2 months ago

@gnypit I'm sorry to hear about the recent events in your life. I hope things are getting back in control.

Thank you for constructive feedback regarding comments, documentation and demos.

I have tried to automate the documentation construction as much as possible, and hence it is a bit light in some places. However, just for your information, the parts in which it is documentation-light (as you say) are the sections that are private (i.e. not exposed to the users) indicated by the leading _. This does beg the question of who @pyOpenSci is aiming to, is it users, developers or both? To me, as a user of a package, it is not often I'd be digging around the internals. As a developer, it is something I might do if the situation requires it. I believe all functions (public or private) are named fairly sensibly, and have docstrings.

Regarding comments, I feel this is personal style - I tend to put them when the step is non-intuitive (for me), but I feel it can clutter the code somewhat if it is everywhere.

As for demos, I have various demos in https://github.com/astro-informatics/sleplet/tree/main/examples. Would surrounding these with explanatory prose suffice your request? I'm not a big fan of notebooks personally, but could convert the files in the examples folder to notebooks.

SimonMolinsky commented 2 months ago

Hi @gnypit , sorry to hear you have experienced such things; hopefully, your situation improves!

@gnypit Thank you for your review; just one more thing: could you copy & paste the review template into a comment in this thread and select boxes based on your review: https://www.pyopensci.org/software-peer-review/how-to/reviewer-guide.html#peer-review-template (You should be logged in to GitHub, it won't be possible from email).

@gnypit , @paddyroddy :

However, just for your information, the parts that are documentation-light (as you say) are the sections that are private (i.e., not exposed to the users) indicated by the leading _. This does beg the question of who https://github.com/pyOpenSci is aiming at: users, developers, or both?

Answering this question: docs in private functions are not required. @pyOpenSci focuses on both users and developers - as potential maintainers. Documenting private functions is nice to have but totally optional 😊

gnypit commented 1 month ago

Okay @SimonMolinsky & @paddyroddy - just in case, I'm still dabbling in the functionalities on a Linux VM, but:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 30h by @gnypit + time spent by @magsol


Review Comments

The overall impression is very good: the code is mainly OOP, with clear organisation of files, their names, and the names of classes, methods and functions. The documentation and examples could use a little more prose around them or comments within them, as already discussed. However, all user-faced parts of code are very thoroughly written and the private functiones, methods, etc. are clearly indicated. Nevertheless, expanding on comments and guides/tutorials might make it easier for both users and contributors to faster and better orient themselves in the library. It doesn't change the fact, that for specialists in the field who are semi-advanced in Python the clarity of the code, intrinsic messages and equations put in LaTeX, together with the required documentation, seem sufficient.

The one concept in programming might be very new and difficult to get used to even for semi-advanced programmers in Python, like multiprocessing. It concernes the private parts of the code, as the _parallel_methods.py is using a fork of the multiprocessing library. As there are other libraries and methods to make parallel operations in Python, also depending on the hardware one uses (e.g. CUDA), it might be a good idea to significantly expand examples and explain this part of the library in depth so that if someone uses this part of the library, it is done quite purposefully. It is, of course, merely a suggestion on what could be done as a part of future development of the library.

Overall, the biggest challenge before contributors is to make the documentation a little more friendly and embark on a boring and tiresome adventure to add comments or alternatives, as already discussed with author during the review. There already are examples of in-depth comments, so there is a clear and present standard to strive for. Apart from this, the library is very well-organised, written beautifully in OOP in Python, with some functional parts. It is intuitive and modern. Also, introducing multiprocessing shows the author's consideration for computing cost, which is a growing necessity in today's scientific world - especially for those who have the skills but cannot afford proper hardware.

Finally, as a personal comment, though I'm still playing around with the functionalities, as I misunderstood the scope of JOSS review, I can already recommend approving the package and will simply let yopu know here in the comments, whenever and whatever else comes up.

paddyroddy commented 1 month ago

The one concept in programming might be very new and difficult to get used to even for semi-advanced programmers in Python, like multiprocessing. It concernes the private parts of the code, as the _parallel_methods.py is using a fork of the multiprocessing library. As there are other libraries and methods to make parallel operations in Python, also depending on the hardware one uses (e.g. CUDA), it might be a good idea to significantly expand examples and explain this part of the library in depth so that if someone uses this part of the library, it is done quite purposefully. It is, of course, merely a suggestion on what could be done as a part of future development of the library.

multiprocess is a fork of multiprocessing as you say. The reason for using it is that it uses dill rather than pickle, which makes it easier to serialise data - and much easier to work with IMO. As mentioned previously, this is in the non-public facing part of the code, so shouldn't be too important for users. However, there is definite scope for speeding up further with CUDA etc., depending on what hardware one has available.

SimonMolinsky commented 1 month ago

@gnypit thank you for your review! As I understand, we're waiting for the performance checks from your side. If you need any support with the point Packaging guidelines: The package conforms to the pyOpenSci [packaging guidelines](https://www.pyopensci.org/python-package-guide). then let me know!

gnypit commented 3 weeks ago

Hello all! As I have recommended package approval already in April, I thought that you're free to go on. I apologise for the confusion. I checked all the boxes in the review template. I triple-checked some minor performance issues that I had; they're caused by problems with my virtual machine and not by the code itself.

@paddyroddy coming back to the multiprocessing, my starting the subject was motivated only by giving some suggestions. Of course it's a private part of the code, so it's a secondary matter for now.

Is there anything else that's required of me @SimonMolinsky ?

SimonMolinsky commented 3 weeks ago

@gnypit Thank you!

@paddyroddy As I see, there are no other changes required. Thus, I'm going to approve the package!

SimonMolinsky commented 3 weeks ago

@paddyroddy Congratulations, SLEPLET has become officially a part of the pyOpenSci family! Thank you for submitting your package here. I will let you know when it will be updated on the pyOS website as soon as it happens :)

@gnypit & @magsol, thank you for your support and time!

paddyroddy commented 3 weeks ago

@paddyroddy coming back to the multiprocessing, my starting the subject was motivated only by giving some suggestions. Of course it's a private part of the code, so it's a secondary matter for now.

Thank you for your comments and review @gnypit!

paddyroddy commented 3 weeks ago

@paddyroddy Congratulations, SLEPLET has become officially a part of the pyOpenSci family! Thank you for submitting your package here. I will let you know when it will be updated on the pyOS website as soon as it happens :)

Great, thank you. Excited to have this recognition!

SimonMolinsky commented 1 week ago

@paddyroddy Thank you again. We are wrapping things up to close the review officially. I've seen on PyPI that you added the pyOS badge to your package. Please fill out the the post-review survey in your spare time :)

And again, you might be interested in writing a blog post about your package. Here are examples and the template:

pandera movingpandas

TEMPLATE: here is a markdown example that you could use as a guide when creating your post.

Last but not least, we are growing fast and looking for editors, reviewers, and experts in their fields. Your knowledge could be invaluable to the organization. So let me know if you would like to join the pyOpenSci community! :slightly_smiling_face:

And something that does not require your input: I've opened a thread about your package on the pyOpenSci Discourse. Your package will soon be visible on the pyOpenSci website, Pull Request with metadata is pending :hourglass:


🎉 SLEPLET has been approved by pyOpenSci! Thank you @paddyroddy for submitting SLEPLET, and many thanks to @magsol and @gnypit for reviewing this package! 😸

Author Wrap Up Tasks

There are a few things left to do to wrap up this submission:

Editor Final Checks

Please complete the final steps to wrap up this review. Editor, please do the following:


If you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the peer-review-guide.

paddyroddy commented 1 week ago

Thanks, @SimonMolinsky! I've filled in the survey, and will look into the blog post.

paddyroddy commented 1 week ago

@SimonMolinsky I've made a PR for the blog post https://github.com/pyOpenSci/pyopensci.github.io/pull/412

paddyroddy commented 1 week ago

@SimonMolinsky looks like it's on the website now so this issue can be closed?

SimonMolinsky commented 1 week ago

@paddyroddy Yes, you are right! Thanks! I'm closing the issue :)