[REVIEW]: WODEN: A CUDA-enabled package to simulate low-frequency radio interferometric data

whedon commented 3 years ago

Submitting author: @JLBLine (Jack L. B. Line) Repository: https://github.com/JLBLine/WODEN Version: v1.1.0 Editor: @dfm Reviewer: @plaplant, @mkolopanis Archive: 10.5281/zenodo.5819254

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/bbc90ec4cd925ade93ed0781e571d247"><img src="https://joss.theoj.org/papers/bbc90ec4cd925ade93ed0781e571d247/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/bbc90ec4cd925ade93ed0781e571d247/status.svg)](https://joss.theoj.org/papers/bbc90ec4cd925ade93ed0781e571d247)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@plaplant & @mkolopanis, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @dfm know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Review checklist for @plaplant

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@JLBLine) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @mkolopanis

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@JLBLine) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @plaplant, @mkolopanis it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 3 years ago

Wordcount for paper.md is 1091

whedon commented 3 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1109/JSTSP.2008.2005327 is OK
- 10.1017/pasa.2012.007 is OK
- 10.1017/PASA.2020.18 is OK
- 10.1093/mnras/stab1560 is OK
- 10.3847/1538-3881/aabc4f is OK
- 10.1051/0004-6361/201322068 is OK
- 10.1017/pasa.2017.54 is OK
- 10.1093/mnras/stu1368 is OK
- 10.1093/mnras/stx1547 is OK

MISSING DOIs

- None

INVALID DOIs

- None

whedon commented 3 years ago

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.77 s (279.4 files/s, 104491.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C/C++ Header                    27            564           2136          47666
C                               54           2592           2308           9239
SVG                              4              0             69           3539
CUDA                             5            789            220           3484
Python                          29            865           1027           2386
reStructuredText                36            426            277            645
CMake                           16            109            109            613
Bourne Shell                    29             82             27            280
TeX                              1              9              0            188
JSON                             8              0              0            164
Markdown                         3             40              0            145
make                             1              4              7              9
YAML                             1              4             10              7
-------------------------------------------------------------------------------
SUM:                           214           5484           6190          68365
-------------------------------------------------------------------------------

Statistical information for the repository 'd1f5eaa7d80ac169958511dd' was
gathered on 2021/08/31.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
JLBLine                        117         35977          13797           95.94
Jack Line                       10          1231            665            3.65
Tony Farlie                      2           181             32            0.41

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
JLBLine                   68783          191.2          2.1                8.27

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

dfm commented 3 years ago

@JLBLine, @plaplant, @mkolopanis – This is the review thread for the paper. All of our communications will happen here from now on. Thanks again for agreeing to participate!

Please read the "Reviewer instructions & questions" in the first comment above.

Both reviewers have checklists at the top of this thread (in that first comment) with the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#3676 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please make a start well ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.

mkolopanis commented 3 years ago

@dfm I wanted to disclose some potential COI I have with this review.

Both Jack Line and myself are collaborators on the Murchison Widefield Array (MWA, the array for which WODEN is specifically designed). I would like to request this COI be waived since we work independently with no direct communication or collaboration on any papers or projects.

Much of Jack's work focuses on the Real Time System (RTS) processing pipeline. Meanwhile when I do perform work for the MWA, it is through their other power spectrum/imaging pipeline (FHD+Eppsilon) or through an independent analysis pipeline and packages.

dfm commented 3 years ago

@mkolopanis: Thanks for this! I am happy to waive this COI since the collaboration is large and if you feel you can provide an impartial review and if @JLBLine is happy with that. Let me know here or via email if you have any concerns about this. Thanks!

JLBLine commented 3 years ago

@dfm @mkolopanis Yup happy to have the COI waived, makes sense to me

mkolopanis commented 3 years ago

@JLBLine thanks for this package! WODEN seem like it will be great addition to growing list of simulators for the interferometric world with the use of CUDA acceleration and unique basis function support.

I have some questions and recommendations after my first read through of the paper, and my experience following the examples provided in the documentation.

Installation

The detailed installation guide found on read the docs is helpful, however, I would recommend rearranging the sections so How to install Dependencies directly follows the Dependencies section. This could be very useful for users who may be new to or uncomfortable with installing C applications from source. This would require moving the How to install Dependencies section to be a sub-section in the Installation section.

The post compilation required section confuses me. I understand the necessity of adding the executables into the path but is it possible to set a custom prefix for compilation to be able to run make install similar to other applications without the need for the init_WODEN.sh scritp? I am not too familiar with building C applications. Maybe @plaplant has more thoughts on this.

Another issue I ran into is that while some of these packages are available from apt repositories, attempting to install via apt produced errors but compiling directly from source does not. This may be a product of my environment though. I have been conducting tests using the nvidia/cuda:11.4.1-devel-ubuntu18.04 docker image and I know some things do not always work as expected in docker without the right configuration. @JLBLine do you have any insight on this or have you tried installing json-c and erfa via package managers?

Functionality

The software writes visibilities to disk in the uvfits format, however the files written do not appear to fully comply with the fits standard outlined in AIPS memo 117. There are missing mandatory header fields outlined in Table 8. Providing fully compliant uvfits output would make visibility files (and therefore WODEN itself) more modular and accessible from multiple processing pipelines and packages. As an example, the output files are easily converted to CASA measurement sets for imaging in CASA, but can not be read by the pyuvdata python package due to the missing header fields.

It would be encouraging to provide evidence that WODEN's computed visibilities match analytic expectations where possible. The ctest suite appears to make comparisons between calculated visibilities and analytically solvable configurations, but never mentions this is done (only that they "check your system / the code runs as expected."). I think adding to the documentation that tests are performed against analytic visibilities would be helpful.

Documentation

The link in the docs to polarised_source_and_FEE_beam.ipynb which lives here_ appears to redirect to the same page, but viewing the source for the page I can find the actual link and found the accompanying notebook very helpful for how polarization is handled in WODEN.

statement of need

Regarding the statement of need in the documentation, I mentioned earlier that WODEN would be a great addition to the every growing list of interferometric visibility simulators. But given that the number of simulators is quite large, I think a quantitative comparison would provide more definitive evidence for the need of WODEN over other simulators mentioned especially for the intended use in the MWA. The Line 2020 paper makes quantitative comparisons between WODEN's shapelet approach and other source models in the imaging and power spectrum domain, but these comparisons are useful at showing the high level and end product results. The comparisons were made to evaluate the methods outlined in the paper and not the computational efficiency or accuracy of WODEN compared to other simulators.

Community Guidelines

The author encourages questions be emailed to him directly on repo's main README and a note is provided in the documentations to create a github issue if a certain beam feature is desied. However, no written community guidelines are provided for researchers looking contribute to the repository. An addition of these guidelines and what steps to undertake would make the process of contributing more accessible to other researchers.

Automated Tests

No automated test suites exist for this repo, but the author does provide instructions for running ctest to validate the installation. Many CI/CD platforms exist where building and testing could be automated. Setting these up, along with detailed community guidelines, would go a long way to making the code more accessible to future developers to verify/validate that their contributions preserve functionality. I would be happy to help set up CI for this repo if the author is interested. On a related note, I've never made a C application that also tracks code coverage, but I've found a few details about this as well. maybe @plaplant has experience with either of these too?

Software Paper

There are a few general items in the paper.

(known in the field as "point sources) -> this parenthetical should be followed by a comma.
Astropy Collaboration citation -> should this be a parenthetical citation?
Under a formalism like the above -> I understand what is meant by this sentence but it is difficult to parse. Perhaps something along the lines of "Under this discrete sky formalism, upwards of j>=25 x 10^6 or more components..."

References

The paper refers to pyuvsim and links directly to the repository. I think a reference to the pyuvsim JOSS paper would be fair, but I will leave this to the discretion of @dfm and @JLBLine since I am an author on pyuvsim.

whedon commented 3 years ago

:wave: @mkolopanis, please update us on how your review is going (this is an automated reminder).

whedon commented 3 years ago

:wave: @plaplant, please update us on how your review is going (this is an automated reminder).

JLBLine commented 3 years ago

Hey @mkolopanis, thanks for the review! I was holding fire in case @dfm had some comments, but I'll jump in and make some replies here. Before I go ahead and action anything in the list below, should we be making these issues on the WODEN repo? I think that's how the review process goes on JOSS (this is my first JOSS paper so correct me if wrong).

Installation

The detailed installation guide found on read the docs is helpful, however, I would recommend rearranging the sections so How to install Dependencies directly follows the Dependencies section. This could be very useful for users who may be new to or uncomfortable with installing C applications from source. This would require moving the How to install Dependencies section to be a sub-section in the Installation section.

Sure, I thought it would be neater in it's own guide but I agree with your point, I'll move it.

_The post compilation required section confuses me. I understand the necessity of adding the executables into the path but is it possible to set a custom prefix for compilation to be able to run make install similar to other applications without the need for the initWODEN.sh scritp? I am not too familiar with building C applications. Maybe @plaplant has more thoughts on this.

This is possible with CMake I'm sure, I'll look into it; most likely I'll still get the user to install in the same build directory and then get CMake to make symlinks into somewhere like /usr/local/bin

Another issue I ran into is that while some of these packages are available from apt repositories, attempting to install via apt produced errors but compiling directly from source does not. This may be a product of my environment though. I have been conducting tests using the nvidia/cuda:11.4.1-devel-ubuntu18.04 docker image and I know some things do not always work as expected in docker without the right configuration. @JLBLine do you have any insight on this or have you tried installing json-c and erfa via package managers?

Hmm I think I also had problems when installing json-c with apt, which is why I have instructions on how to build from source. I could make an explicit note in the documentation. The erfa package installation is a bit of a pig's ear, and again the only reliable way I could install it was via the instructions I have included. Again, I could make a note that this is a viable way to install it.

Functionality

The software writes visibilities to disk in the uvfits format, however the files written do not appear to fully comply with the fits standard outlined in AIPS memo 117. There are missing mandatory header fields outlined in Table 8. Providing fully compliant uvfits output would make visibility files (and therefore WODEN itself) more modular and accessible from multiple processing pipelines and packages. As an example, the output files are easily converted to CASA measurement sets for imaging in CASA, but can not be read by the pyuvdata python package due to the missing header fields.

Fair shout. I'll start working on making my uvfits play nice with pyuvdata. I think for this whole process I'll start a new branch called joss_review so that'll be one of the first things I push.

It would be encouraging to provide evidence that WODEN's computed visibilities match analytic expectations where possible. The ctest suite appears to make comparisons between calculated visibilities and analytically solvable configurations, but never mentions this is done (only that they "check your system / the code runs as expected."). I think adding to the documentation that tests are performed against analytic visibilities would be helpful.

Roger that, I'll try and spell out what the tests are doing more clearly in the Documentation

Documentation

_The link in the docs to polarised_source_and_FEEbeam.ipynb which lives here appears to redirect to the same page, but viewing the source for the page I can find the actual link and found the accompanying notebook very helpful for how polarization is handled in WODEN.

Oops, I'll fix that link.

statement of need

Regarding the statement of need in the documentation, I mentioned earlier that WODEN would be a great addition to the every growing list of interferometric visibility simulators. But given that the number of simulators is quite large, I think a quantitative comparison would provide more definitive evidence for the need of WODEN over other simulators mentioned especially for the intended use in the MWA. The Line 2020 paper makes quantitative comparisons between WODEN's shapelet approach and other source models in the imaging and power spectrum domain, but these comparisons are useful at showing the high level and end product results. The comparisons were made to evaluate the methods outlined in the paper and not the computational efficiency or accuracy of WODEN compared to other simulators.

I think anything more than a "given this sky model, WODEN ran this fast and pyuvsim ran this fast, here are the output images as a comparison" is going to be a large amount of work which I would argue is outside the scope of this paper. Did you have an idea of what level of comparison you were after @mkolopanis?

Community Guidelines

The author encourages questions be emailed to him directly on repo's main README and a note is provided in the documentations to create a github issue if a certain beam feature is desied. However, no written community guidelines are provided for researchers looking contribute to the repository. An addition of these guidelines and what steps to undertake would make the process of contributing more accessible to other researchers.

I'll have a stab at this

Automated Tests

No automated test suites exist for this repo, but the author does provide instructions for running ctest to validate the installation. Many CI/CD platforms exist where building and testing could be automated. Setting these up, along with detailed community guidelines, would go a long way to making the code more accessible to future developers to verify/validate that their contributions preserve functionality. I would be happy to help set up CI for this repo if the author is interested. On a related note, I've never made a C application that also tracks code coverage, but I've found a few details about this as well. maybe @plaplant has experience with either of these too?

I considered this, but I haven't been able to find a way to automate the GPU tests without provided GPU resources via some cloud service (or my own desktop). If you can work out a free way to automate all the testing I'd be very grateful for the help. Just automating tests on the python/C seems very doable like you say, but I feel like any new functionality anyone contributes should be pushed all the way through WODEN so without GPU testing I'm not sure how helpful it'll be. Similarly for the code coverage, I couldn't work out how to report that for the CUDA code but I'm happy to learn if someone knows how.

Software Paper

There are a few general items in the paper.

(known in the field as "point sources) -> this parenthetical should be followed by a comma. Astropy Collaboration citation -> should this be a parenthetical citation? Under a formalism like the above -> I understand what is meant by this sentence but it is difficult to parse. Perhaps something along the lines of "Under this discrete sky formalism, upwards of j>=25 x 10^6 or more components..."

Can do on these changes.

References

The paper refers to pyuvsim and links directly to the repository. I think a reference to the pyuvsim JOSS paper would be fair, but I will leave this to the discretion of @dfm and @JLBLine since I am an author on pyuvsim.

Seems fair to me too.

dfm commented 3 years ago

@JLBLine:

Before I go ahead and action anything in the list below, should we be making these issues on the WODEN repo? I think that's how the review process goes on JOSS (this is my first JOSS paper so correct me if wrong).

That's the process that I normally use/recommend, but feel free to proceed however would work best for you - we're not strict about the specifics of that workflow. For small comments it's fine to just report back here when you've made the changes.

mkolopanis commented 3 years ago

Always nice to close a few easy issues :laughing: @JLBLine if you would like I can try to help with some of this. Would you prefer your repository to be forked for a PR or directly edit on the main repository?

Regarding the statement of need, I agree it feels like a lot of work. I honestly wish we as a community has an organized curated list of simulators. Maybe that partially does exist as part of the pyuvsim repository with its reference simulation suite. Let me get back to you on this. This could be as simple as what you suggested and a small PR on pyuvism to add WODEN to its comparison documentation.

plaplant commented 3 years ago

@JLBLine this package is very interesting! Thanks very much for the contribution. In general I find the paper and the code well-written and informative. I have some comments that are similar to those of @mkolopanis, mostly around installation and documentation.

Installation

I was working as an unprivileged user on a shared cluster (which I imagine is a common-ish use case), and as such was forced to build many of the dependencies by hand. This also meant that my libraries ended up in non-standard locations (e.g., $HOME/local/json-c/lib). Though cmake was able to find these libraries by editing the values of, e.g., JSONC_ROOT, I was still required to update the value of LD_LIBRARY_PATH when attempting to run woden or run_woden.py. If possible, it would be more user-friendly to export an appropriate LD_LIBRARY_PATH as part of the init_woden.sh command. Although there is a section of the installation documentation explaining this, making it more automated would be helpful.

Also, to address the previous point on my end, I had updated the init_woden.sh script to update my LD_LIBRARY_PATH, but noticed that the file was overwritten upon a subsequent invocation of cmake. Admittedly I do not have a ton of experience using cmake, but this was counter-intuitive to me. If this behavior is standard, then perhaps a statement in the documentation warning the user to this effect would be helpful.

Test suite

I greatly appreciate the fact that there are unit tests defined, and building and running them through ctest was relatively straightforward. Along the lines of what @mkolopanis suggested, it would be nice to have automated CI running of these tests, though I realize that requires access to a testing provider that has GPUs available to test the full functionality.

Also, it might be good to have a more detailed description of what the specific tests being run do, and what parts of the code they are testing. Getting detailed coverage of individual lines would be ideal, though I understand this may be difficult if such tools do not exist for C/CUDA code. (I personally have not used ctest in another context, so I do not know how feasible this is.)

Documentation

The online documentation on ReadTheDocs is incredibly detailed and very useful for a new user getting familiar with the software package. The pages are well-written and contain helpful mathematical formulas and visualizations when appropriate. The in-line comments on the code are also very helpful for explaining what the code is doing in certain points. Nice work!

Code Timing

I ran the simulations suggested on the Fornax A documentation page, and found performance in line with those listed on the page. I am using a Tesla V100 card, and the multi-scale CLEAN and shapelet simulations took approximately 15 seconds and 16 seconds, respectively. I think it's a nice feature to have simple benchmarking scripts available easily for the user, which helps demonstrate the code's performance. I don't believe there are any changes to make here, I'm just remarking here for posterity that I see performance similar to what is in the docs.

Functionality Documentation

Reading the page on source specification and the one on primary beams, it seems (to me) that there is full support for handling polarized sources and correctly propagating them through to output Stokes visibilities. However, this is not explicitly noted in the documentation (and it should be highlighted, because it's a nice feature!). The mathematical expression of the radio interferometer measurement equation (RIME) in the docs and in the paper should include reference to polarization. Also, is there current or future-planned support for outputting instrumental polarizations (e.g., xx and yy) in addition to Stokes parameters? This would be helpful for comparing with other simulation packages like pyuvsim.

Software Paper

The paper is clear and well-written. I have just a few small comments to slightly improve the clarity of some points:

As mentioned above, a more explicit discussion of polarization (both of sources and the primary beam) should be included, as well as being included in the equation.
It might be worth mentioning exactly how the pixelization of the sphere is done (e.g., a regular grid in (l, m), HEALPix pixels, something else?), as it affects how things like the sky model and beam interpolation are handled.
Although there are benchmarking results included in both the reference paper (Line et al. 2020) and some of the documentation pages, it might be nice to include it in the "Example application" section as well, to highlight the speed of the calculation.

I agree with @mkolopanis that an apples-to-apples comparison with other simulation packages would be quite nice, but might be a fair amount of work and is likely beyond the scope of this paper.

mkolopanis commented 3 years ago

regarding the simulator comparison, I think it will be reasonable to add WODEN to the comparison section of pyuvsim's documents. That can easily be handled outside of this paper.

mkolopanis commented 3 years ago

It also seems like CircleCI allows for GPU execution on some of their runners according to their documentation. But I've turned up the same nothing regarding coverage for cuda code. You could also use github actions with a self-hosted runner with a GPU available. I've seen that done in other projects of mine and if you have a dedicated GPU accelerated server with a FQDN this is doable too. There are some security concerns that github identifies about using self-hosted runners on public repos though which should be considered.

JLBLine commented 3 years ago

Thanks for checking that out @mkolopanis - I've just looked into CircleCI which seemed a great solution, until I looked into signing up, and you can only get GPUs on a 'custom pricing plan' so it won't be free. I'm reluctant to use a self-hosted runner, mostly because I don't want random people spinning up my desktop GPU when I might be using it, and the only other GPUs I have access to are queue-based super clusters. I'm not trying to be miserly, but I don't really want to pay to host a cloud-based option, because whatever research funding I have will always be temporary given the nature of academia.

For the comparison with other simulators, that sounds good. Steven Murray mentioned something similar about getting a pyradiosky model through WODEN as well, to compare to the Hera simulators. Maybe once we've got this review finished we can have a meeting to work out how to tackle this all in one foul swoop.

I've just pushed a new branch called joss_review, which I think should write uvfits files that work with pyuvdata. At least they work to the extent that you can create a pyuvdata.UVData object and read in a WODEN uvfits file. Please have a quick look and let me know if it works as you would expect.

If you are keen to do anything to WODEN in the mean time, I'd prefer a new fork (or branch) and PR rather than working on master.

JLBLine commented 3 years ago

Thanks for the review @plaplant. Please see some responses below. Once I've finished this response, I'll try and grab all the relevant points from both your and @mkolopanis reviews and turn them into git issues. I should then be able to notify you guys once I think I've solved each of them and how, and maybe get you guys to verify them so I can close them? If you see any other issues I've missed feel free to add them.

Installation

_I was working as an unprivileged user on a shared cluster (which I imagine is a common-ish use case), and as such was forced to build many of the dependencies by hand. This also meant that my libraries ended up in non-standard locations (e.g., $HOME/local/json-c/lib). Though cmake was able to find these libraries by editing the values of, e.g., JSONC_ROOT, I was still required to update the value of LD_LIBRARY_PATH when attempting to run woden or run_woden.py. If possible, it would be more user-friendly to export an appropriate LD_LIBRARY_PATH as part of the initwoden.sh command. Although there is a section of the installation documentation explaining this, making it more automated would be helpful.

Yeah I think I can see a way for this to happen, I'll have a go.

_Also, to address the previous point on my end, I had updated the init_woden.sh script to update my LD_LIBRARYPATH, but noticed that the file was overwritten upon a subsequent invocation of cmake. Admittedly I do not have a ton of experience using cmake, but this was counter-intuitive to me. If this behavior is standard, then perhaps a statement in the documentation warning the user to this effect would be helpful.

So the init_woden.sh file actually lives in the WODEN/templates folder, and is copied into the build directory upon compilation. I don't think this is standard as I don't think having a bash file that you call for setup is standard at all, it just made sense to me (this is the first C/CUDA/CMake project I've ever written so I am admittedly flying by the seat of my pants). What would be more standard would be to setup a make install option as @mkolopanis suggested. I think I'll do that, as well as keeping the init_woden.sh as an option if you don't want to make install, and leave the warning to the user that it will be overwritten, and can be edited in the WODEN/templates folder if required.

Test suite

I greatly appreciate the fact that there are unit tests defined, and building and running them through ctest was relatively straightforward. Along the lines of what @mkolopanis suggested, it would be nice to have automated CI running of these tests, though I realize that requires access to a testing provider that has GPUs available to test the full functionality.

As I've been discussing with @mkolopanis, this seems impossible to do for free, and I don't really want to have to pay to support online testing.

Also, it might be good to have a more detailed description of what the specific tests being run do, and what parts of the code they are testing. Getting detailed coverage of individual lines would be ideal, though I understand this may be difficult if such tools do not exist for C/CUDA code. (I personally have not used ctest in another context, so I do not know how feasible this is.)

Yeah the joys of CUDA rearing it's head again. I can certainly put more documentation into what the tests are doing and why, which I'll work on. I think I might be able to hook up CMake-codecov ( https://github.com/RWTH-HPC/CMake-codecov ) to my python and C tests to get coverage for the python/C code, so I'll look into that. Pretty sure it won't work with the CUDA code though.

Documentation

The online documentation on ReadTheDocs is incredibly detailed and very useful for a new user getting familiar with the software package. The pages are well-written and contain helpful mathematical formulas and visualizations when appropriate. The in-line comments on the code are also very helpful for explaining what the code is doing in certain points. Nice work!

Sweet, thanks!

Code Timing

I ran the simulations suggested on the Fornax A documentation page, and found performance in line with those listed on the page. I am using a Tesla V100 card, and the multi-scale CLEAN and shapelet simulations took approximately 15 seconds and 16 seconds, respectively. I think it's a nice feature to have simple benchmarking scripts available easily for the user, which helps demonstrate the code's performance. I don't believe there are any changes to make here, I'm just remarking here for posterity that I see performance similar to what is in the docs.

Cool, good to know. I think you'll find that for these small simulations, large amounts of time are actually I/O, especially because I've been caveman in the way I write out a binary file from C/CUDA and read it into python to write the uvfits, rather than attempting to write uvfits directly from with C. I found with the V100, for 200,000 point sources, 16 time steps and 384 freq channels, the V100 ran three times faster out of the box than the 1080 Ti. Tasty!

Functionality Documentation

Reading the page on source specification and the one on primary beams, it seems (to me) that there is full support for handling polarized sources and correctly propagating them through to output Stokes visibilities. However, this is not explicitly noted in the documentation (and it should be highlighted, because it's a nice feature!). The mathematical expression of the radio interferometer measurement equation (RIME) in the docs and in the paper should include reference to polarization. Also, is there current or future-planned support for outputting instrumental polarizations (e.g., xx and yy) in addition to Stokes parameters? This would be helpful for comparing with other simulation packages like pyuvsim.

Good point about the polarisation, I'll update that in the docs and the paper. Right now there are no plans for outputting the instrumental pols, partly because I haven't researched the maths of how to change Stokes input params into instrumental pols yet.

Software Paper

The paper is clear and well-written. I have just a few small comments to slightly improve the clarity of some points:

As mentioned above, a more explicit discussion of polarization (both of sources and the primary beam) should be included, as well as being included in the equation. It might be worth mentioning exactly how the pixelization of the sphere is done (e.g., a regular grid in (l, m), HEALPix pixels, something else?), as it affects how things like the sky model and beam interpolation are handled. Although there are benchmarking results included in both the reference paper (Line et al. 2020) and some of the documentation pages, it might be nice to include it in the "Example application" section as well, to highlight the speed of the calculation.

I shall make changes to address these points. Regarding the first point, there is no pixelisation done internally to WODEN. The RIME equation is just done for every direction in the sky model, for all frequencies and time steps. All pixelisation happens in the sky model (made of point, Gaussian, and shapelet models), so you could put in any pixelisation scheme you want.

JLBLine commented 3 years ago

I've just invited you both @mkolopanis and @plaplant to be collaborators on the WODEN repo so I can tag you into issues once I've fixed them. Let me know if this arrangement doesn't work for you, seems like a tidy way to knock things off the list to me.

dfm commented 3 years ago

Thanks for checking that out @mkolopanis - I've just looked into CircleCI which seemed a great solution, until I looked into signing up, and you can only get GPUs on a 'custom pricing plan' so it won't be free. I'm reluctant to use a self-hosted runner, mostly because I don't want random people spinning up my desktop GPU when I might be using it, and the only other GPUs I have access to are queue-based super clusters. I'm not trying to be miserly, but I don't really want to pay to host a cloud-based option, because whatever research funding I have will always be temporary given the nature of academia.

@JLBLine: We definitely don't require something like this for a JOSS review as long as the reviewers can execute the code on their own machines, so I think we're ok! It would be great if some of the free CI options started supporting GPU testing someday, but we're definitely not there yet.

mkolopanis commented 2 years ago

Sorry for the radio silence I was on vacation this last week. I would never want to force someone to pay for CI options. At least we tried looking for something. If you wouldn't mind, @JLBLine could you re-invite me to the repository my invitation expired while I was gone? If you still want to be able to tag me on issues at least. It seems the things I would be helpful with (CI) aren't really an option for this project.

Thanks for chiming in with clarification @dfm

JLBLine commented 2 years ago

Ah cool, hope you had a nice break @mkolopanis. You have been invited again. Once you've accepted, I'll start tagging both you and @plaplant on issues I think I've fixed

JLBLine commented 2 years ago

FYI @dfm @mkolopanis @plaplant I am actively working on this still; I'm working on Issue 4 around documenting my testing. I think I've come up with a test to estimate the fundamental accuracy of WODEN in it's current form, which might make a nice plot in the JOSS paper. Running it by colleagues later this week to check it makes sense. Will let you know how it goes.

JLBLine commented 2 years ago

Ok @dfm @mkolopanis @plaplant through better testing, I've found that the fully float precision version of WODEN visibilities to be up to a few percent off expected values for ~10km baselines (via the method explained in this section on ReadTheDocs here). I can reduce this down to 0.1 percent with some double precision (at the cost of a two times slow down), and 10e-6 of a percent with full double (at the cost of a ten times slow down). I want to add a switch that gives the user a precision choice, with a corresponding estimate in accuracy to go into the JOSS paper. I realise this is all taking time, but I think this functionality is something that should be peer-reviewed, so I hope you'll bear with me as I implement this and document it all.

dfm commented 2 years ago

@JLBLine: this sounds sensible to me. Do you have a sense for about how long you'll need? If @mkolopanis and @plaplant are up for it, I can "pause" this review and we can come back to it when you're ready?

plaplant commented 2 years ago

@JLBLine thanks for all the excellent updates! I'm always a fan of quantifying the uncertainty in the calculation, and I think giving the user the flexibility to choose the precision/time trade-off for themselves is a very nice feature.

@dfm I'm happy to "pause" or keep the review open, either one works for me.

mkolopanis commented 2 years ago

I'm also fine with a brief "pause." I agree that a quantitative measure of the accuracy of the simulations is a vital part of interfemoteric data, especially if any use case involves power spectrum estimation. Thanks @JLBLine for the updates, and I also think toggle-able switches to control the precision are a great option for WODEN.

JLBLine commented 2 years ago

Hi @dfm @mkolopanis @plaplant I reckon I'll need until the end of the week to make the float/double switch work with all the units tests, and then another week to address the rest of the review issues. If you want to pause, I can just notify you all when I've finished everything. I'll open up another issue for the float / double switch so we can tick it off the list at the end.

dfm commented 2 years ago

@JLBLine: Sounds good! I won't officially pause the review, but let's check in in about two weeks to see where we are. Keep us posted!

JLBLine commented 2 years ago

Hey @dfm @mkolopanis @plaplant, the float vs double switch is working now, and I've updated the unit/integration tests accordingly. I've update the JOSS paper here with the new estimate on accuracy, and better explanations on how I handle polarisation.

Still have a few of the repo issues to catch up on, so the work will spill into next week, but I think the code itself is in it's final form for this paper. Mostly Documentation that needs to catch up.

JLBLine commented 2 years ago

@mkolopanis @plaplant @dfm I've finished responding to all the issues, and have updated the paper to cover the new functionality of switching between float and double precision, as well as estimating the accuracy of both modes. I think this concludes the work so please check it out at your convenience.

JLBLine commented 2 years ago

Hi @mkolopanis @plaplant @dfm, just a heads up, I'm on leave from the 8th of December until the new year, so would be keen to try and get this tied up before then if possible. Is there anything else you need from me? I think I've made all the necessary changes.

mkolopanis commented 2 years ago

Thanks for the poke about this. I have reviewed the changes on the repo and updated my review checklist. All my concerns/comments have been addressed.

JLBLine commented 2 years ago

Woot thanks @mkolopanis

plaplant commented 2 years ago

@JLBLine apologies for the delay in responding. All of my concerns have been addressed, and I have updated my reviewer checklist. Excellent work!

dfm commented 2 years ago

@plaplant, @mkolopanis: Thanks for your reviews!! I really appreciate the time that you have both volunteered for this.

@JLBLine: I'm sorry that we haven't been able to stick to your timeline, and I'll still need a few days for a couple of editorial tasks. After that I'll have a few things for you to do before publication, but absolutely no rush on those, especially when you're on vacation. Thanks for your work and patience!

dfm commented 2 years ago

@whedon check references

whedon commented 2 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1109/JSTSP.2008.2005327 is OK
- 10.1017/pasa.2012.007 is OK
- 10.1017/PASA.2020.18 is OK
- 10.1093/mnras/stab1560 is OK
- 10.3847/1538-3881/aabc4f is OK
- 10.1051/0004-6361/201322068 is OK
- 10.1017/pasa.2017.54 is OK
- 10.1093/mnras/stu1368 is OK
- 10.1093/mnras/stx1547 is OK

MISSING DOIs

- None

INVALID DOIs

- None

dfm commented 2 years ago

@whedon generate pdf

whedon commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

dfm commented 2 years ago

@whedon generate pdf

whedon commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

dfm commented 2 years ago

@JLBLine: welcome back! Thanks for your updates on the manuscript. Here are the last steps that I'll need from you:

Take one last read through the manuscript to make sure that you're happy with it (it's harder to make changes later!), especially your name and affiliation.
Confirm that the version number including all the changes as part of this review is v1.1.0.
Create an archived release of that version of the software (using Zenodo or something similar). Please make sure that the metadata (title and author list) exactly match the paper. Then report the DOI of the release back to this thread.

Thanks!

JLBLine commented 2 years ago

Ok @dfm I believe I've made a Zenodo release correctly, it's been given the DOI 10.5281/zenodo.5819254. Link to the Zenodo if you need it: https://zenodo.org/record/5819254#.YdUPWntBzJk

dfm commented 2 years ago

@whedon set 10.5281/zenodo.5819254 as archive

whedon commented 2 years ago

OK. 10.5281/zenodo.5819254 is the archive.

dfm commented 2 years ago

@whedon set v1.1.0 as version

whedon commented 2 years ago

OK. v1.1.0 is the version.

openjournals / joss-reviews