[REVIEW]: Generating synthetic star catalogs from simulated data for next-gen observatories with py-ananke

openjournals / joss-reviews

Reviews for the Journal of Open Source Software

Creative Commons Zero v1.0 Universal

707 stars 37 forks source link

[REVIEW]: Generating synthetic star catalogs from simulated data for next-gen observatories with py-ananke #6234

Open editorialbot opened 8 months ago

editorialbot commented 8 months ago

Submitting author: !--author-handle-->@athob@warrickball<!--end-editor-- Reviewers: @rrjbca, @lheckmann Archive: Pending

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/357c0445d891fc10e1b0ca4dba1e3cc0"><img src="https://joss.theoj.org/papers/357c0445d891fc10e1b0ca4dba1e3cc0/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/357c0445d891fc10e1b0ca4dba1e3cc0/status.svg)](https://joss.theoj.org/papers/357c0445d891fc10e1b0ca4dba1e3cc0)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@rrjbca & @lheckmann, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @warrickball know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @rrjbca

📝 Checklist for @lheckmann

editorialbot commented 8 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 8 months ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.05 s (443.4 files/s, 99716.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
TeX                              1            111              0           1635
Python                          14            219            387            742
Markdown                         2             95              0            198
YAML                             2              5             11             74
Jupyter Notebook                 1              0            990             31
-------------------------------------------------------------------------------
SUM:                            20            430           1388           2680
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 8 months ago

Wordcount for paper.md is 2923

editorialbot commented 8 months ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.3847/1538-4365/ab5b9d is OK
- 10.1007/s11214-006-8315-7 is OK
- 10.3847/2041-8213/ac947c is OK
- 10.1093/mnras/stac3347 is OK
- 10.3847/2041-8213/acade4 is OK
- 10.3847/2041-8213/acad00 is OK
- 10.3847/2041-8213/acad01 is OK
- 10.3847/2041-8213/acb3a5 is OK
- 10.3847/1538-4365/acaaa9 is OK
- 10.3847/2041-8213/acc94b is OK
- 10.1093/mnras/stad1629 is OK
- 10.3847/1538-4357/acc2bc is OK
- 10.3847/1538-4357/acec76 is OK
- 10.48550/arXiv.2306.02465 is OK
- 10.48550/arXiv.1110.3193 is OK
- 10.1051/0004-6361/202141938 is OK
- 10.3847/1538-4357/ab042c is OK
- 10.48550/arXiv.1902.05569 is OK
- 10.48550/arXiv.2306.12302 is OK
- 10.48550/arXiv.2306.11784 is OK
- 10.1111/j.1365-2966.2010.17187.x is OK
- 10.1093/mnras/sts028 is OK
- 10.1093/mnras/stu1023 is OK
- 10.1093/mnras/stu1227 is OK
- 10.1093/mnras/stu1536 is OK
- 10.1093/mnras/stu1738 is OK
- 10.1093/mnras/stu2058 is OK
- 10.1093/mnras/stv725 is OK
- 10.1093/mnras/stv627 is OK
- 10.1093/mnras/stv1937 is OK
- 10.1093/mnras/stv1190 is OK
- 10.1093/mnras/stw2035 is OK
- 10.1093/mnras/stv2484 is OK
- 10.1093/mnras/stw1862 is OK
- 10.1093/mnras/stx1160 is OK
- 10.1093/mnras/stx3040 is OK
- 10.1093/mnras/stx3112 is OK
- 10.1093/mnras/stx3304 is OK
- 10.1093/mnras/sty1780 is OK
- 10.1093/mnras/sty1690 is OK
- 10.1093/mnras/stx3124 is OK
- 10.1093/mnras/stz968 is OK
- 10.3847/1538-4357/ab0654 is OK
- 10.1093/mnras/stz937 is OK
- 10.1093/mnras/staa2453 is OK
- 10.3847/1538-4357/abcafa is OK
- 10.1093/mnras/stab322 is OK
- 10.1051/0004-6361/202039429 is OK
- 10.1093/mnras/stac3489 is OK
- 10.1093/mnras/stad513 is OK
- 10.1093/mnras/stad1205 is OK
- 10.1093/mnras/stad2419 is OK
- 10.1093/mnras/stac3620 is OK
- 10.3847/2041-8205/827/2/L23 is OK
- 10.1051/0004-6361/201833051 is OK
- 10.1046/j.1365-8711.2001.04022.x is OK
- 10.1038/s41586-020-2649-2 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.5281/zenodo.8364959 is OK
- 10.1051/0004-6361/201322068 is OK
- 10.3847/1538-3881/aabc4f is OK
- 10.3847/1538-4357/ac7c74 is OK
- 10.1051/0004-6361/201732493 is OK
- 10.1109/MCSE.2007.55 is OK
- 10.1007/978-1-4842-2677-3_5 is OK
- 10.1111/j.1365-2966.2006.11043.x is OK
- 10.1088/0004-637X/730/1/3 is OK
- 10.1111/j.1365-2966.2012.21948.x is OK
- 10.1093/mnras/stu2029 is OK
- 10.1093/mnras/stu1605 is OK
- 10.1093/mnras/stv1281 is OK
- 10.1093/mnras/stt1034 is OK
- 10.3847/0004-637X/822/2/73 is OK
- 10.3847/1538-4357/835/1/77 is OK
- 10.1051/0004-6361:20020612 is OK
- 10.1086/588526 is OK
- 10.1051/0004-6361:20078467 is OK
- 10.1051/0004-6361:20054163 is OK
- 10.1093/mnras/sty2745 is OK
- 10.1093/mnras/staa3356 is OK
- 10.1038/s41550-020-1131-2 is OK
- 10.1051/0004-6361/201936866 is OK
- 10.1103/PhysRevD.104.043530 is OK
- 10.1016/j.ascom.2022.100667 is OK
- 10.48550/arXiv.2202.06797 is OK
- 10.3847/1538-4357/acc582 is OK
- 10.1088/0004-6256/150/5/150 is OK
- 10.3847/1538-4365/aae9f0 is OK
- 10.3847/1538-4365/ac00b3 is OK
- 10.48550/arXiv.2306.16475 is OK
- 10.1051/0004-6361/202039657 is OK
- 10.1051/0004-6361/202243940 is OK

MISSING DOIs

- 10.1088/0004-637x/730/1/3 may be a valid DOI for title: Galaxia: A Code to Generate a Synthetic Survey of the Milky Way

INVALID DOIs

- None

editorialbot commented 8 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

warrickball commented 8 months ago

Hi @lheckmann, @rrjbca, and thanks again for agreeing to review. This is the review thread for the paper. All of our correspondence will now happen here.

Please read the "Reviewer instructions & questions" in the first comment above, and generate your checklists by commenting @editorialbot generate my checklist on this issue. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. We aim to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. We also encourage reviewers to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#6234 so that the issue/PR is linked to this thread. Please also feel free to comment and ask questions on this thread. JOSS editors have found it better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but start whenever you can. JOSS reviews are iterative and the authors can start responding while you continue to review other parts of the submission.

You're welcome to also assign yourself to this issue if that fits in with how you work. You may also want to check your GitHub notification settings to see that you do (or don't, if you prefer!) also get email notifications.

Finally, don't hesitate to ask many any questions you might have about the process.

rrjbca commented 8 months ago

Review checklist for @rrjbca

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/athob/py-ananke?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@athob) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

lheckmann commented 7 months ago

Review checklist for @lheckmann

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/athob/py-ananke?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[ ] Contribution and authorship: Has the submitting author (@athob) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[ ] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[ ] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[ ] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[ ] Functionality: Have the functional claims of the software been confirmed?
[ ] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[ ] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[ ] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[ ] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[ ] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[ ] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

lheckmann commented 7 months ago

@editorialbot generate pdf

editorialbot commented 7 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

warrickball commented 7 months ago

Hi @rrjbca, @lheckmann,

I see you've started your reviews and was wondering if there was anything you already wanted to comment on. JOSS reviews are intended to be iterative and you're encouraged to comment on whatever parts of the submission you've reviewed as the proceed. That way, the authors can start working on those elements (e.g. installation) while you continue to review others (e.g. testing and functionality).

Let me know if you have any questions!

lheckmann commented 6 months ago

Hi @rrjbca, @lheckmann,

I see you've started your reviews and was wondering if there was anything you already wanted to comment on. JOSS reviews are intended to be iterative and you're encouraged to comment on whatever parts of the submission you've reviewed as the proceed. That way, the authors can start working on those elements (e.g. installation) while you continue to review others (e.g. testing and functionality).

Let me know if you have any questions!

Sorry for the late reply! There is one thing already:

For the installation, there are instructions for pip but not for conda, which would be beneficial to not cause inconsistencies with other installed python packages.
How do I best communicate this to the authors?

athob commented 6 months ago

Sorry for the late reply! There is one thing already:

For the installation, there are instructions for pip but not for conda, which would be beneficial to not cause inconsistencies with other installed python packages. How do I best communicate this to the authors?

Thanks for your feedback @lheckmann, I assume I should be able to respond to this here.

Currently the package is not published on the PyPI repository, so the preferred pip installation instructions point to the GitHub repository where the package source code is hosted. I am not aware of a way to implement something similar with conda, is there one?

lheckmann commented 6 months ago

Sorry for the late reply! There is one thing already: For the installation, there are instructions for pip but not for conda, which would be beneficial to not cause inconsistencies with other installed python packages. How do I best communicate this to the authors?

Thanks for your feedback @lheckmann, I assume I should be able to respond to this here.

Currently the package is not published on the PyPI repository, so the preferred pip installation instructions point to the GitHub repository where the package source code is hosted. I am not aware of a way to implement something similar with conda, is there one?

Okay true,

I now did: conda create --name ananke python==3.9 conda activate ananke and then the pip install command you mentioned to have it separate from my other python installations. But this can then be also left to the user.

The only two other points for the installation:

jupyter is missing in the installation setup but needed for the tutorial
For the additional package (numpy etc) no version are specified right? It works with current ones, but this could change in the future so might be good to specify the ones used.

athob commented 6 months ago

Indeed thanks for noticing that.

Regarding jupyter, py-ananke itself does not require it to run its modules, so I would prefer to not add it to the package dependencies. To solve that, I have added a couple sentences in the Readme section where the tutorial notebook is mentioned. The added text points at the dedicated Project Jupyter webpage where the collaboration posted instructions for installing jupyter.

Regarding the version specs of the existing dependencies, I am going to better constrain them by imposing the major version of each dependency for both py-ananke and its submodules py-EnBiD-ananke and py-Galaxia-ananke. For example, I am already currently imposing a minimum version for vaex in the dependencies of py-Galaxia-ananke, but I should also constrain the dependency to no more than versions 4.x in case a version 5.x is published with non-backward compatible features. I will send an update here when I'll have added those requirements.

athob commented 6 months ago

Thank you for your suggestion @lheckmann, I have tested the main package and its submodules for what would be acceptable minimal version requirements, and added those to the dependencies specifications. I have pushed the updates to the main branch of each repository. I also had some minor patching updates that I included in these newer versions.

Note also that as part of the patching, I had to add scikit-learn in the dependencies of py-EnBiD-ananke, so I added the corresponding bibtex reference to the paper draft.

lheckmann commented 6 months ago

Perfect, thank you!

I've now started to test the code more properly and have some points/questions to remark:

an.Ananke.make_dummy_particles_input is not very well document. I understand it is just to try the code, but help function only explains n_part as an input parameter, even though with_densities=False is stated as a parameter as well.
Why is the observer position chosen randomly in the example? Wouldn't it make most sense to place it at the GAIA position?
Is there a way to check the Input objects and its methods mentioned in L150 in the pdf? Or should they only operate in the background?
Same for the py-anake classes mentioned afterwards
I would advice on improving the docs for the ananke.run() function since it runs the main pipeline. In the pdf the different steps are indicated, but in the code doc it just refers to running the pipeline. Also, are there test implemented checking the different substeps?

That is it for now, I'll continue with checking the final output tables etc. Just wanted to leave the intermediate comments here.

athob commented 5 months ago

Thanks for your feedback, I have pushed some updates in the last couple of days to address your comments. Note that those updates are currently available in the develop branch (at the time of this message under this commit state):

I have improved the documentation for this method, including the description of the with_densities kwarg, but also adding a description of what columns are contained in the output dictionary similar to that shown in the notebook. Moreover, given that this method is a class method of the class Ananke, I have made it possible to call it directly from the package namespace ananke (along with a few other such classmethods for which it would streamline usage to not reference the Ananke class namespace).
Most of the data shown in the notebook example is completely random by choice, as this allows me to not rely on providing the data of an existing simulation with this notebook. That said, I am hoping in the future to have a notebook example with such realistic galaxy input data downloaded from a publicly available simulation with a small memory footprint to keep it portable. Your comment also reminded me of my plans to make the observer's position default on that of our solar system with respect to our galaxy, which I have now implemented. To add up, I completed the observer coordinates customization feature by also allowing the user to provide velocity coordinates, with the default set on the Sun velocity in our galaxy.
It is possible to see the implementation of the Input class in the corresponding py-Galaxia-ananke submodule source file. Note however that objects of this class, and of all those available in the py-Galaxia-ananke in general, should not be directly constructed by an ananke user. The only such objects that an ananke user may end up interacting with are objects of the Output class which I attempted to properly document.
Regarding the classes that are in the ananke package, only Ananke objects should be manipulated by an ananke user. The others are for internal use.
I will be working on improving the docs for the Ananke.run method. Regarding tests, I had started to build a suite of tests for the 3 packages (py-ananke and its 2 submodules) starting with submodule py-EnBiD-ananke which has a relatively complete set. I am still working on building the suite of tests for py-Galaxia-ananke. Note that since the major contributing feature of these 2 submodules is to wrap over existing C++ software, my main focus has been to design their tests to evaluate these wrapping functionalities, trusting that the backend C++ software is operational and accurate. I intend eventually to test beyond the wrapping functionalities, in particular when designing the tests for the main py-ananke module which includes other functionalities of its own that don't relate to wrapping C++ software.

I am looking forward to hearing back from you regarding your other comments.

lheckmann commented 5 months ago

Thank you for all the explanations and adaptions of the code. I now wanted to try the changes on the develop branch, but when I try to import ananke it gives me the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lea/.local/lib/python3.9/site-packages/ananke/__init__.py", line 17, in <module>
    from .Ananke import Ananke
  File "/home/lea/.local/lib/python3.9/site-packages/ananke/Ananke.py", line 15, in <module>
    from .Universe import Universe
  File "/home/lea/.local/lib/python3.9/site-packages/ananke/Universe.py", line 12, in <module>
    from Galaxia_ananke.defaults import DEFAULTS_FOR_PARFILE
ModuleNotFoundError: No module named 'Galaxia_ananke.defaults'

what I did, was switching to the origins/develop branch and am on this HEAD: HEAD is now at abc8a18 Need to update notebook to reflect recent documentation changes and then re-run pip install .

athob commented 5 months ago

Thanks for the feedback and I think the re-installation didn't update the submodule Galaxia_ananke properly. Since you ran pip install ., I assume that you cloned the repository and are running the command from it. There is a known issue at re-installation between the submodules and the pip cache, I haven't found a proper way to address it yet. Could you try to run the command pip install . --no-cache-dir instead? If this is successful, and the import proceed without error, you should end up with the following versions:

import ananke, Galaxia_ananke, EnBiD_ananke
print(ananke.__version__)          #0.1.1.dev0
print(EnBiD_ananke.__version__)    #0.1.3b2.dev0
print(Galaxia_ananke.__version__)  #0.3.1.dev0

My guess is that the first installation stored in the pip cache the outdated versions of both submodules EnBiD_ananke and Galaxia_ananke that it installed in the process. As a result, while the re-installation pip should still update the git submodules to their correct versions, it also ignores the newer versions all together and instead use the cached ones.

Now while typing this, I'm realizing that I should try experimenting forcing the version number of the submodules. I just need to look into how to do it jointly with the way it's done now i.e. specifying their location in the repository since they only exist as submodules at this time.

rrjbca commented 5 months ago

Hi @athob, first my apologies for the delay in completing my review. In general both the software and manuscript are already at a very high standard and I congratulate you on the excellent work. I have on small specific request to make the API documentation more accessible for users which I have summarized in Issue #7. Once this has been addressed I would be happy to recommend the article for publication.

In addition, while reviewing the software I recognized a few further improvements that I would recommend, but these are beyond the scope of the review criteria and I only offer them as suggestions that you may choose to work on in the future:

Make tagged releases of py-ananke and distribute via PyPi
Instead of git submodules, also make tagged releases of py-EnBiD-ananke and py-Galaxia-ananke, distribute them via pypi and add them as dependencies of py-ananke.
The example jupyter notebook satisfies the review criteria for a manual test to verify functionality, but it would be advisable to adapt it into an automated test.
A tool like sphinx could be used to create user-friendly html documentation that incorporates the python docstrings.

lheckmann commented 5 months ago

Thanks for the feedback and I think the re-installation didn't update the submodule Galaxia_ananke properly. Since you ran pip install ., I assume that you cloned the repository and are running the command from it. There is a known issue at re-installation between the submodules and the pip cache, I haven't found a proper way to address it yet. Could you try to run the command pip install . --no-cache-dir instead? If this is successful, and the import proceed without error, you should end up with the following versions:
import ananke, Galaxia_ananke, EnBiD_ananke
print(ananke.__version__)          #0.1.1.dev0
print(EnBiD_ananke.__version__)    #0.1.3b2.dev0
print(Galaxia_ananke.__version__)  #0.3.1.dev0
My guess is that the first installation stored in the pip cache the outdated versions of both submodules EnBiD_ananke and Galaxia_ananke that it installed in the process. As a result, while the re-installation pip should still update the git submodules to their correct versions, it also ignores the newer versions all together and instead use the cached ones.

Now while typing this, I'm realizing that I should try experimenting forcing the version number of the submodules. I just need to look into how to do it jointly with the way it's done now i.e. specifying their location in the repository since they only exist as submodules at this time.

Thank you, yes that was the problem! Now it worked and I could check the new implementation and they work nicely.

Here my last comments for the output part of the code/jupyter:

The output columns specified in the jupyter for the final data frame are not the same as when using help(survey).
Referring to the columns in the jupyter: There are a few which are not explained: e.g. gaiadr2_g_bpmag gaiadr2_g_bpmag_Err gaiadr2_g_bpmag_Intrinsic gaiadr2_g_bpmag_Sig and similar ones with other filters, and satid and smass are not in the explanation. On the other hand mini is in the explanation, but the column does not exist in my output survey object.
This could be due to me not understanding a certain aspect of the code, but why are there so many rows with nan on many columns (e.g. ra, dec, mub, vr,...) in the survey dataframe?
The method survey.flush_extra_columns_to_hdf5 does not have any documentation when using help. Please add a short one, also showing how one could specify and output file.

athob commented 5 months ago

Hi @rrjbca, @lheckmann, thanks both for the latest feedback! I will be working on your comments in the coming weeks (I also still need to improve the documentation for the method Ananke.run as @lheckmann previously suggested it).

I just wanted to give a small update: I was able to implement a temporary fix to the issue that was ongoing with pip keeping wheels in cache of the submodules, and ignoring the updated versions in the process. It's a bit of a ugly fix truly as it calls pip cache remove from within the setup.py file (and in the future I'll need to think more carefully about it when I'll publish the package on the PyPI), but I wanted to have this done so future updates to the code requires only to run pip once.

I'll add more when I'll have addressed your latest comments! Thank for your time

warrickball commented 4 months ago

Hi @athob, have you had a chance to address the reviewers' latest round of comments?

athob commented 4 months ago

Hi @warrickball, I have not yet. Over the past month (and as of now), I had to put priority over different tasks that are currently more urgent (some involving development work on this package that are not related to the review). I'm sorry I haven't been able to dedicate time to the review. Is there a set deadline that I should know about for working on the review?

warrickball commented 4 months ago

Hi @athob, there isn't any particular deadline, we editors just like to check in from time to time to see if there's anything we can do to help keep things progressing.

warrickball commented 3 months ago

Hi @athob, I was wondering if you'd made any progress on responding to the reviewers' comments?

athob commented 3 months ago

Hi @warrickball, thanks for checking. I am in the same situation with other tasks that are taking priority over the review at this time. I think however that once summer starts I should be able to resume working on the reviewers' comments. I'll send an update here to inform everyone when that will be the case.

warrickball commented 2 months ago

Hi @athob, I just wanted to check in to see if there might've been some progress yet. I'm going to be largely unavailable next week and away completely for two weeks after, so am very unlikely to revisit this review until the end of July.

athob commented 2 months ago

Hi @warrickball, my apologies for only replying now. I imagine you're currently unavailable. Thanks for checking, I intend to resume my work on the review starting next week.

warrickball commented 1 month ago

Hi again @athob, this is me checking in again on your progress the reviewers' comments? I can't see that there's been any activity in the relevant repos for at least 2 months and in effect nothing in this review for 4. We might be starting to push the limits of how long a JOSS submission can sit on hold.

athob commented 1 month ago

Hi @warrickball, I have been mostly learning to use sphinx and readthedocs lately to build a documentation that would meet the recommendations brought by @rrjbca. I am having some challenges with it and have been working on finding out how to adapt it properly for the various packages. It notably seems to be due to some of the classes methods using a classproperty decorator I took from the astropy package, which the sphinx autosummary struggles with. I understand this review is taking longer than usual, please know that it is currently on top priority on my list of tasks.

athob commented 2 weeks ago

Hi @rrjbca & @lheckmann, my apologies for the time it took me to resume this review.

I was able to address the Issue #7 raised by @rrjbca regarding the need of an appropriate API documentation. I have already merged the newer version in the repository develop branch.

As mentioned in my previous message, and in the issue's discussion, I chose to use sphinx and readthedocs to build this documentation following @rrjbca's recommendation. To avoid repeating texts unnecessarily, I built the sphinx workflow in such a way that it utilizes the repository readme as an introduction page, and the notebook as an example usage page. I intend to add more notebooks in the future with more specialized examples. In the process, I've also improved how the docstrings were being generated in the package itself. That allowed me to improve the documentation of the Ananke.run method, a task I had left unfinished that was requested by @lheckmann in a previous comment.

With that, I still need to address the questions/recommendations in @lheckmann's last comment. Beside that, I also noted the remaining recommendations of @rrjbca (described as beyond the scope of the review), specifically the distribution of the package and its submodules on PyPI, and the possibility of turning the example notebook into a functionality test.

athob commented 2 weeks ago

Thanks again for your patience @lheckmann. I have addressed most of your comments with the latest version on develop. I had some clarifications to add which I discuss below:

The output columns specified in the jupyter for the final data frame are not the same as when using help(survey).

Thank you for pointing that out, hopefully the jupyter is now in a more appropriate shape

Referring to the columns in the jupyter: There are a few which are not explained: e.g. gaiadr2_g_bpmag gaiadr2_g_bpmag_Err gaiadr2_g_bpmag_Intrinsic gaiadr2_g_bpmag_Sig and similar ones with other filters, and satid and smass are not in the explanation. On the other hand mini is in the explanation, but the column does not exist in my output survey object.

There was indeed an issue regarding satid and smass, which I accordingly addressed. I wrongly put mini for what actually was smass, and I had completely overlooked the satid column. Regarding the gaiadr2_g_{} and other similarly formatted columns, I added a few comments to the description with examples to discuss those (as well as in the Ananke.run documentation)

This could be due to me not understanding a certain aspect of the code, but why are there so many rows with nan on many columns (e.g. ra, dec, mub, vr,...) in the survey dataframe?

This is one comment I need to look into further to properly verify what is going on here, but nan values can happen in some cases, notably when the extinction driver try to extrapolate the line-of-sight column densities. In normal cases with actual star particles data generated with an appropriate simulation model, this doesn't cause too many issues, in the case of that jupyter notebook where the data given is purely random, it does generate a significant amount of nan values. As mentioned in my last comment, I plan to add a couple other notebooks to the documentation meant to showcase a real case of mock catalog, such as the one featured in Sanderson et al. 2020.

The method survey.flush_extra_columns_to_hdf5 does not have any documentation when using help. Please add a short one, also showing how one could specify and output file.

I have documented this method, as well as another new method that was added with the newer updates since your comment (apply_post_process_pipeline_and_flush). Regarding how to specify an output file, due to the way the output is actually memory-mapped to an existing backend HDF5 file, the flushing uses that backend file already. With the current implementation, this file is written to the output_dir given to the run method which defaults to the current working directory. Note that with newer updates that try to implement the possibility of working with a catalog split across multiple files (for parallelizing purpose), there can be more than one file. The default behavior of the pipeline is to assume only one output file, but I need to give more thoughts into how to properly document this feature.

I will be addressing in my next comment the remaining recommendations of @rrjbca.

athob commented 2 weeks ago

@rrjbca I have address your remaining recommendations with the latest version 0.2.0b3.

The code that lived in the example notebook now also lives in an independent test script that is ran automatically by the CI/CD GitHub action. This is at the moment the only test available, I intend in the future to broaden the set of tests, notably have some that are more targeted to test individual features.

Regarding the distribution on the PyPI, I have taken steps towards that goal. I have notably overhauled the installing workflow so I could better integrate the versioneer software into it (that I discovered thanks to this tutorial), which the package is now able to use. I wanted to test out the publication on the PyPI but found out the project name ananke is already taken by a project that looks abandoned. Instead of using a different name than that of its import name, I have taken measures to request for the name to be made available under PEP 541. I will proceed with the publication once a decision is made on that request. I also want first to test out the publication of ananke on the PyPI before proceeding with publishing to Conda Forge, or before looking into publishing the submodules EnBiD_ananke and Galaxia_ananke.

With that I believe I have addressed all the remaining comments @warrickball. @lheckmann @rrjbca I am looking forward for your next comments, once again I am sorry for the time I took before addressing your recommendations.

rrjbca commented 1 week ago

@athob @warrickball The new documentation and CI pipeline look great; I am happy to recommend the article for publication.

warrickball commented 3 days ago

I'm having a look over things but wanted to make a quick technical comment:

I also want first to test out the publication of ananke on the PyPI before proceeding with publishing to Conda Forge, or before looking into publishing the submodules EnBiD_ananke and Galaxia_ananke.

We won't require you to create a Conda package because it looks like you can install py-ananke from pip, so Conda users can create a Conda environment and then use pip. I also don't think we need to wait for the PyPI registration because by configuring the repo as a package, it can be installed using the GitHub URL, as indicated in the instructions.

I'll also note that I'm not sure you need the submodule trick any more either. Because py-EnBiD-ananke and py-Galaxia-ananke are themselves packages, I believe you could include them as dependencies like e.g.

py-EnBiD-ananke @ git+https://github.com/athob/py-Galaxia-ananke@0aac265

though don't quote me on the syntax here! I don't expect this to happen now but you might want to use this in the future to simplify the build process.

By the way, I'm having trouble installing py-ananke but I'll open an issue in the repo to resolve that.