openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
718 stars 38 forks source link

[REVIEW]: PoUnce: A framework for automatized uncertainty quantification simulations on high-performance clusters #4683

Closed editorialbot closed 1 year ago

editorialbot commented 2 years ago

Submitting author: !--author-handle-->@JakobBD<!--end-author-handle-- (Jakob Dürrwächter) Repository: https://github.com/JakobBD/pounce Branch with paper.md (empty if default branch): Version: v1.0.0 Editor: !--editor-->@Nikoleta-v3<!--end-editor-- Reviewers: @georgiastuart, @salrm8 Archive: 10.5281/zenodo.7600634

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/c62521adac904e832a1ac4130406f80f"><img src="https://joss.theoj.org/papers/c62521adac904e832a1ac4130406f80f/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/c62521adac904e832a1ac4130406f80f/status.svg)](https://joss.theoj.org/papers/c62521adac904e832a1ac4130406f80f)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@georgiastuart & @salrm8, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @Nikoleta-v3 know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @georgiastuart

📝 Checklist for @salrm8

editorialbot commented 2 years ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 2 years ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.11 s (618.1 files/s, 72017.2 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          33            935           1024           2973
YAML                            14             91             86            802
Markdown                        13            225              0            762
TeX                              4             53             54            701
JSON                             1              0              0             51
make                             1             11              6             27
INI                              1              0              0              5
-------------------------------------------------------------------------------
SUM:                            67           1315           1170           5321
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 2 years ago

Wordcount for paper.md is 885

editorialbot commented 2 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1287/opre.1070.0496 is OK
- 10.1007/3-540-45346-6_5 is OK
- 10.1137/16M1082469 is OK
- 10.1137/15M1046472 is OK
- 10.2172/1177077 is OK
- 10.1016/j.jocs.2020.101204 is OK
- 10.1137/S1064827503427741 is OK
- 10.1016/j.jocs.2015.08.008 is OK
- 10.21105/joss.02871 is OK
- 10.1061/9780784413609.257 is OK
- 10.1142/S2591728518500445 is OK
- 10.5281/zenodo.16303 is OK
- 10.5281/zenodo.21389 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Nikoleta-v3 commented 2 years ago

👋🏼 @JakobBD, @georgiastuart, @salrm8 this is the review thread for the paper. All of our communications will happen here from now on.

As a reviewer, the first step is to create a checklist for your review by entering

@editorialbot generate my checklist

as the top of a new comment in this thread.

These checklists contain the JOSS requirements ✅ As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4683 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@Nikoleta-v3) if you have any questions/concerns. 😄 🙋🏻

georgiastuart commented 2 years ago

Review checklist for @georgiastuart

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Nikoleta-v3 commented 2 years ago

Hey 👋🏼 @georgiastuart and @salrm8 😄 any update on your reviews?

Nikoleta-v3 commented 2 years ago

Hey 👋🏼 @georgiastuart and @salrm8 📳

georgiastuart commented 2 years ago

@Nikoleta-v3 hoping to get to this review this week. Sorry for the delay!

Nikoleta-v3 commented 2 years ago

No worries @georgiastuart 😄 thank you for getting back to me!

danielskatz commented 1 year ago

Hi @georgiastuart & @salrm8 - this review started about 2 months ago, and the normal JOSS goal for reviews is 4-6 weeks, but this doesn't seem to be making progress. Can we do anything to help you?

salrm8 commented 1 year ago

Hi @danielskatz , My apologies for the delay. I speed up the process.

Nikoleta-v3 commented 1 year ago

Hey 👋🏼 @georgiastuart and @salrm8 😄 any update on your reviews?

georgiastuart commented 1 year ago

@Nikoleta-v3 working on it! Sorry for the delay!

georgiastuart commented 1 year ago

@JakobBD can you describe the role of the other four authors on the paper? It seems like, from the repository, you're the only committer.

Edit: I manually looked through all the commits and I do see some from Thomas and Fabian.

georgiastuart commented 1 year ago

Can you expand upon

This includes I/O bottlenecks due to very large numbers of relatively small files

Is this referring to a specific widely-used software for UQ? It definitely isn't inherent in UQ code designs.

JakobBD commented 1 year ago

@georgiastuart thank you for reviewing this paper!

@JakobBD can you describe the role of the other four authors on the paper?

As you already stated, the early stage coding of PoUnce was a joint effort between Thomas, Fabian and myself, as can also be seen from their commits. Andrea Beck and Claus-Dieter Munz contributed along the whole code development process with advice on required code functionality, numerous major and minor code design choices, and reviewing/testing code documentation. This is why I saw the JOSS authorship criterion of “active project direction and other forms of non-code contributions” fulfilled.

Is this referring to a specific widely-used software for UQ? It definitely isn't inherent in UQ code designs.

Large numbers of model evaluations are inherent to Monte Carlo simulations due to their half-order convergence, especially to Multilevel and Multifidelity approaches, where O(1e5-1e7) evaluations of the cheapest model are common.

If the baseline model is invoked as a standalone tool with its own file I/O, the number of created files is of course at least the number of model evaluations. Most HPC cluster file systems are designed for large files, but not for such large numbers of files. In an earlier approach (prior to PoUnce), we encountered severe performance bottlenecks due to this.

To elaborate a little further:

A common “integrated” alternative is to convert an existing simulation code into a library with a subroutine interface, which is then called from the UQ framework and returns the result, but does not write its own files. However, practical simulation setups often consist of a tool chain of several pre- and post-processing steps, which are often linked through file I/O and require different numbers of processors. This makes converting such a tool chain to a single subroutine without file I/O very cumbersome, if not infeasible. Another disadvantage of this approach is that intermediate steps of the tool chain are lost as they are not saved to a file.

(On a side note, both of these interface types are implemented in the leading UQ software Dakota, and the arguments for and against each of them are pointed out in the documentation.)

Both approaches are also implemented in PoUnce, but a third (preferred) option is also offered, where we hope to combine the advantages of both approaches. Here, models are executed in a separate process for each step in the tool chain with file I/O, but batches of model evaluations are grouped together as a single process with a common file I/O. This also requires some implementation, but it retains the flexibility and traceability of the standalone approach and the performance of the integrated approach.

I hope this answers your question? I propose to clarify this in the paper by adding

... This includes I/O bottlenecks due to very large numbers of relatively small files (if each baseline model evaluation has its individual file I/O), as well as ...

salrm8 commented 1 year ago

I am really sorry for such a long delay.

I think PoUnce will be useful for running non-intrusive uncertainty propagation problems (e.g. CFD) on HPC platforms. The paper is written clearly and the documentation could be easily compiled and was found to be informative.

I have a few comments/questions:

  1. After installing the requirements, I could run the tests in ini/internal_local. It could be useful to have a short description of the tests in the documentation.

  2. It might be helpful to add a short explanation in the paper/documentation about the type of distributions of random variables which can be handled by the library.

  3. Would there be any problem if the simulations at the same level or fidelity do not progress equally in time? I am asking this to see if the framework can be used for simulation of time-dependent physical problems with the possibility of computing stochastic moments on the fly.

  4. I have a question about this sentence "Stochastic post-processing can be external (for computationally expensive QoIs such as random fields) or framework-internal". at the bottom of p. 10 of the compiled documentation: does this mean the computational of higher-order stochastic moments of QoIs and/or sample distribution of QoI?

JakobBD commented 1 year ago

@salrm thanks to you as well for reviewing this paper!

  • After installing the requirements, I could run the tests in ini/internal_local. It could be useful to have a short description of the tests in the documentation.

Yes, that makes sense. I’ve added a section Tests in the Getting Started chapter.

  • It might be helpful to add a short explanation in the paper/documentation about the type of distributions of random variables which can be handled by the library.

Oh yes, thanks for the comment. I’ve added a section to the Features chapter of the documentation. I haven’t added it to the paper, since it’s supposed to be very short and, for example, also does not mention the included baseline solvers.

  • Would there be any problem if the simulations at the same level or fidelity do not progress equally in time? I am asking this to see if the framework can be used for simulation of time-dependent physical problems with the possibility of computing stochastic moments on the fly.

Short answer: In the current implementation, there is no problem, apart from potential performance losses for vastly different time steps. Whether and how this works depends on the parallel execution strategy and the implementation of the adaptation of the baseline solver. In our current implementation, the time step is set to the minimal one across parallel samples.

Long answer: There are currently two methods for parallel execution of samples (one based on MPI and one on GNU parallel). In our MPI batch implementation (the one which is replicated in the demonstator_batch solver), PoUnce only does the following: It hands a list of stochastic parameters to the solver along with the number of parallel and sequential runs in a single HDF5 file, it writes the scheduler batch job file (including run command, number of nodes and run time) and collects the results also in a single HDF5 file.

It is agnostic to the rest, which thus depends of the implementation of the baseline solver adaptation.

In our case (with the adaptation of FLEXI), we chose to perform a synchronised output of all parallel runs at different time intervals to one HDF5 file. We therefore set the time step to the minimum across all parallel samples (using MPI_ALLREDUCE). (This also means that more time steps will be computed in some samples than needed for stability reasons, which did not impair performance significantly in our applications). In this ‘MPI/HDF5 batch’ case, several statrategies are possible to compute stochastic moments on the fly:

A word on performance: The performance loss due to synchronized time steps is of course determined by how much the time step sizes vary across samples. The problem is inherent in the design choice of synchronous HDF5 I/O. With synchronous I/O, the problem could be somewhat alleviated if the time step variations are predictable and can be estimated from the stochastic input (e.g. due to an uncertain grid spacing): Sorting the samples according to their expected time steps would place samples with similar time steps in parallel, while ones with largely different time steps would be placed in an earlier or later sequential run.

With individual or asynchronous I/O, the baseline solver adaptation could be implemented with an internal scheduler assigning new sample simulations to free compute nodes every time a sample evaluation is finished. But this would not allow to compute stochastic moments on the fly.

  • I have a question about this sentence "Stochastic post-processing can be external (for computationally expensive QoIs such as random fields) or framework-internal". at the bottom of p. 10 of the compiled documentation: does this mean the computational of higher-order stochastic moments of QoIs and/or sample distribution of QoI?

Yes, for example.

Currently, PoUnce only computes mean and standard deviation, and all the external tools we have written do the same.

Regarding the framework-internal case: Adding computation of further quantities in PoUnce is quite simple. To demonstrate this, for the Monte Carlo methods, I’ve added the functionality to plot histograms in the branch stochOutput, which I’ve added to the GitHub repo. Similarly, since our PCE implementation is based on the ChaosPy library, which also provides computation of higher-order stochastic moments, the according functions can be extended easily.

If external tools are used for stochastic post-processing, the only output that is fed back to PoUnce are quantities needed for adaptive sample number computation (such as level difference variances in MLMC). All the rest, including final results, are computed outside of PoUnce, so anything can be added there.

salrm8 commented 1 year ago

@JakobBD Thank you for the clarification. I agree with you that synchronizing between multiple simulations is tricky with perhaps no general solution for it. Also, thank you that you did the small modifications in the documentation and tests. My only last point is to suggest perhaps adding a “contribution list” where you suggest what are the possible improvements and additions to PoUnce in future. The list may initiate collaboration with other interested researchers.

Aside from this, I have no more comments. Therefore @Nikoleta-v3, I recommend accepting the paper!

Nikoleta-v3 commented 1 year ago

Thank you very much for your time @salrm8 😄 would you mind generating your checklist with the command:

@editorialbot generate my checklist

and go over it? I am sure you haven't missed any of the points but just as a sanity check.

I agree with the reviewer's point about a contribution list/contributions guidelines 👍🏻 They can be part of the docs or you can have a standalone file on the root of the repository. There you should include a few sentences about running the test suite (I can see that you have a test file).

JakobBD commented 1 year ago

@salrm8 and @Nikoleta-v3 thank you for the suggestion. I added a file to the root of the repository with a list of possible contributions.

I referenced the code documentation, which includes sections on how to extend the code and, upon @salrm8 's sensible suggestion, a description of the tests.

salrm8 commented 1 year ago

Review checklist for @salrm8

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Nikoleta-v3 commented 1 year ago

Thank you both! 🙏🏻

@georgiastuart did you have a chance to look over JakobBD's response to your comments?

Nikoleta-v3 commented 1 year ago

👋🏻 🛎️ @georgiastuart

Nikoleta-v3 commented 1 year ago

Happy new year everyone 🥳 I hope you had a nice break.

Nikoleta-v3 commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Nikoleta-v3 commented 1 year ago

Hello @JakobBD, and thank you for your patience with your submission. I have emailed your second reviewers and I am waiting for their reply! In the meantime I had a chance to go over your submission.

Some minor comments:

Other issues:

  1. In the documentation the sections “Parameters” and “Features” should include more information. In the parameters you should walk the user over all the sections of a yml file (general, uq_method, …). In the features you should include code snippets with examples of how different methods, solvers etc can be used. The example yml files are great as examples but this information should be more accessible/obvious to a user.

  2. To me pounce it’s the perfect project to be a command line tool. So I could just run pounce parameter_mfmc.yml. Is there a reason behind your decision to keep it in scripts format? 😊

  3. You say that:

Output files will be located in the current working directory, so it is best to copy a parameter file to an empty directory and start pounce from there.

Why not have the location and the name (if not given by the user) of the output file as arguments?

Nikoleta-v3 commented 1 year ago

Hey @JakobBD 👋🏻 did you have a chance to look over my comments?

JakobBD commented 1 year ago

Hey @Nikoleta-v3 thank you for the review! I feel that your suggestions make a lot of sense and my changes have hopefully improved the documentation.

I have addressed them as follows:

  • In the ReadMe, in the Quick start / basic run section there is a typo. It should be python3 ../../src/pounce.py parameter_mlmc.yml.

Thanks, I have fixed that.

  • Could you also include a sentence about the expected outcome? Tell the user that the results are printed in the terminal and also an output csv is exported.

Good point, I have added a sentence.

  • In the Documentation section could you please include a link to the documentation?

I have added the link.

  • In the manuscript line 55: it is meant to be interval or internal?

It is supposed to be internal. With that wording, I try to distinguish the scheduling of samples I do myself within a compute job from the cluster's job scheduling software which handles the queue of compute jobs of different users.

  1. In the documentation the sections “Parameters” and “Features” should include more information. In the parameters you should walk the user over all the sections of a yml file (general, uq_method, …). In the features you should include code snippets with examples of how different methods, solvers etc can be used. The example yml files are great as examples but this information should be more accessible/obvious to a user.

I have extended both sections as you suggested, expecially the "Features" one, which hopefully now explains all parameters and puts them in some context. Some of the details of the uq_methods parameters are beyond the scope of the documentation, but I am preparing a paper on this and will put a reference to it in the documentation once I have a draft.

  1. To me pounce it’s the perfect project to be a command line tool. So I could just run pounce parameter_mfmc.yml. Is there a reason behind your decision to keep it in scripts format? 😊

I've added a Makefile which adds this possibility and I've explained its use in the documentation under "Getting started / Command line tool"

  1. You say that:

    Output files will be located in the current working directory, so it is best to copy a parameter file to an empty directory and start pounce from there. Why not have the location and the name (if not given by the user) of the output file as arguments?

PoUnce runs produce various different files (such as for communication with the external baseline solver, restarting, and output), typically much more than the example from the "Getting started" section. So it makes sense to all have them in a separate directory without other runs to keep an overview. Manually creating that directory and starting PoUnce from it doesn't seem like a major inconvenience to me. This also makes the use of the restart capability clearer, as pounce -r looks for the pounce.pickle file in the current working directory. Otherwise, the same output directory flag would have to be given to PoUnce as when it was started for the first time. From an implementational point of view, file I/O happens at a lot of different places in the code (for example, each baseline solver has its own routines), so I'd have to find and change every single one to prepend the output directory to the file path, and I'd be afraid to miss one and introduce bugs. I hope that makes sense.

Nikoleta-v3 commented 1 year ago

@JakobBD initially, thank you very for addressing my comments 👍🏻

It is supposed to be internal. With that wording, I try to distinguish the scheduling of samples I do myself within a compute job from the cluster's job scheduling software which handles the queue of compute jobs of different users.

I understand now; thank you for explaining.

I have extended both sections as you suggested, expecially the "Features" one, which hopefully now explains all parameters and puts them in some context. Some of the details of the uq_methods parameters are beyond the scope of the documentation, but I am preparing a paper on this and will put a reference to it in the documentation once I have a draft.

This is great; thank you for working on the documentation! One minor issue, can you check that the documentation has built correctly? For example in Workflow I can see the \textit{ commands 🤔

PoUnce runs produce various different files (such as for communication with the external baseline solver, restarting, and output), typically much more than the example from the "Getting started" section. So it makes sense to all have them in a separate directory without other runs to keep an overview. Manually creating that directory and starting PoUnce from it doesn't seem like a major inconvenience to me. This also makes the use of the restart capability clearer, as pounce -r looks for the pounce.pickle file in the current working directory. Otherwise, the same output directory flag would have to be given to PoUnce as when it was started for the first time. From an implementational point of view, file I/O happens at a lot of different places in the code (for example, each baseline solver has its own routines), so I'd have to find and change every single one to prepend the output directory to the file path, and I'd be afraid to miss one and introduce bugs. I hope that makes sense.

Understood; thank you for your reply.

pounce parameter_mlmc.yml 🔥 👏🏻

I will wait for a few more days to see if the second reviewer has any further comments or thoughts and then we can move on with the submission 😄

JakobBD commented 1 year ago

This is great; thank you for working on the documentation! One minor issue, can you check that the documentation has built correctly? For example in Workflow I can see the \textit{ commands 🤔

The markdown web interface is only a byproduct, primarily the documentation is built using pandoc (as stated in the readme), that's why there are still some latex commands. It was chosen over plain markdown due to the better capabilities in pandoc to reference images and literature.

I will wait for a few more days to see if the second reviewer has any further comments or thoughts and then we can move on with the submission 😄

Thanks! 🎉

Nikoleta-v3 commented 1 year ago

@JakobBD thank you for your patience!! We are good to go 👍🏻

At this point could you please:

I can then move forward with accepting the submission 🎉

JakobBD commented 1 year ago

Thanks!

Nikoleta-v3 commented 1 year ago

@editorialbot set 10.5281/zenodo.7600634 as archive

editorialbot commented 1 year ago

Done! Archive is now 10.5281/zenodo.7600634

Nikoleta-v3 commented 1 year ago

@editorialbot set v1.0.0 as version

editorialbot commented 1 year ago

Done! version is now v1.0.0

Nikoleta-v3 commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Nikoleta-v3 commented 1 year ago

@editorialbot recommend-accept

editorialbot commented 1 year ago
Attempting dry run of processing paper acceptance...
editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1287/opre.1070.0496 is OK
- 10.1007/3-540-45346-6_5 is OK
- 10.1137/16M1082469 is OK
- 10.1137/15M1046472 is OK
- 10.2172/1177077 is OK
- 10.1016/j.jocs.2020.101204 is OK
- 10.1137/S1064827503427741 is OK
- 10.1016/j.jocs.2015.08.008 is OK
- 10.21105/joss.02871 is OK
- 10.1061/9780784413609.257 is OK
- 10.1142/S2591728518500445 is OK
- 10.5281/zenodo.16303 is OK
- 10.5281/zenodo.21389 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 1 year ago

:wave: @openjournals/csism-eics, this paper is ready to be accepted and published.

Check final proof :point_right::page_facing_up: Download article

If the paper PDF and the deposit XML files look good in https://github.com/openjournals/joss-papers/pull/3937, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

danielskatz commented 1 year ago

@JakobBD - I'll next proofread the paper, and then let you the next steps

danielskatz commented 1 year ago

@JakobBD - I've suggested some small changes in https://github.com/JakobBD/pounce/pull/1 - please merge this or let me know what you disagree with

JakobBD commented 1 year ago

Looks good, thank you! I merged your changes.

danielskatz commented 1 year ago

@editorialbot recommend-accept