openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: ADIOS4DOLFINx: A framework for checkpointing in FEniCS #6451

Closed editorialbot closed 6 months ago

editorialbot commented 8 months ago

Submitting author: !--author-handle-->@jorgensd<!--end-author-handle-- (Jørgen Schartum Dokken) Repository: https://github.com/jorgensd/adios4dolfinx Branch with paper.md (empty if default branch): Version: v0.8.1.post0 Editor: !--editor-->@danielskatz<!--end-editor-- Reviewers: @gonsie, @Chilipp Archive: 10.5281/zenodo.11094985

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/7866cb142db8a803e32d79a109573d25"><img src="https://joss.theoj.org/papers/7866cb142db8a803e32d79a109573d25/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/7866cb142db8a803e32d79a109573d25/status.svg)](https://joss.theoj.org/papers/7866cb142db8a803e32d79a109573d25)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@gonsie & @Chilipp, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @danielskatz know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @Chilipp

📝 Checklist for @gonsie

editorialbot commented 8 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 8 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.5281/zenodo.10447666 is OK
- 10.1016/j.softx.2020.100561 is OK
- 10.48550/arXiv.2401.05868 is OK
- 10.1145/2998441 is OK
- 10.1145/3524456 is OK

MISSING DOIs

- No DOI given, and none found for title: XDMF and ParaView: checkpointing format
- No DOI given, and none found for title: MPI: A Message-Passing Interface Standard. Version...

INVALID DOIs

- None
editorialbot commented 8 months ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.05 s (1126.4 files/s, 144290.0 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          31            947            749           3881
YAML                            12             97             22            379
Markdown                         5             87              0            220
TeX                              1              9              0             69
TOML                             1              9              5             54
reStructuredText                 1              1              2              2
-------------------------------------------------------------------------------
SUM:                            51           1150            778           4605
-------------------------------------------------------------------------------

Commit count by author:

    45  Jørgen Schartum Dokken
    44  Jørgen S. Dokken
    29  Jorgen S. Dokken
    12  Henrik Finsberg
     5  Francesco Ballarin
     3  jorgensd
     1  Joergen Schartum Dokken
     1  Nate
editorialbot commented 8 months ago

Paper file info:

📄 Wordcount for paper.md is 944

✅ The paper includes a Statement of need section

editorialbot commented 8 months ago

License info:

✅ License found: MIT License (Valid open source OSI approved license)

editorialbot commented 8 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

danielskatz commented 8 months ago

@gonsie and @Chilipp - Thanks again for agreeing to review this submission. This is the review thread for the paper. All of our communications will happen here from now on.

As you can see above, you each should use the command @editorialbot generate my checklist to create your review checklist. @editorialbot commands need to be the first thing in a new comment.

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#6451 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if either of you require some more time. We can also use editorialbot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@danielskatz) if you have any questions/concerns.

danielskatz commented 8 months ago

👋 @gonsie and @Chilipp - if you can go ahead and generate your checklists as above, and then at least check off the COI and CoC items, I would appreciate it, in part to make sure all the permissions are working ok.

Chilipp commented 8 months ago

Review checklist for @Chilipp

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

gonsie commented 8 months ago

Review checklist for @gonsie

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

danielskatz commented 7 months ago

👋 @Chilipp & @gonsie - I'm just checking in on Monday 🙂

Thanks for getting started, @gonsie!

@Chilipp - if you can also at least check off the first two items when you get a chance, I would appreciate it.

Chilipp commented 7 months ago

sure @danielskatz . it's on my todo list for this week

danielskatz commented 7 months ago

👋 @Chilipp & @gonsie - After a couple of weeks, I'm just checking in to see how things are going and if anything blocking you...

Chilipp commented 7 months ago

Hi @danielskatz ! Thanks for the reminder. I started already but could not yet finish. This week is full but it's on the to-do list for next week. My apologies for the delay

Chilipp commented 7 months ago

@danielskatz @jorgensd I am done with the review. Thanks a lot for your great and transparent work on this! Overall I could follow all steps outlined in the documentation (using the docker setup, conda was not possible, see https://github.com/jorgensd/adios4dolfinx/issues/95) and it was sufficiently explained. I only have some minor comments that I'd like to share with you in the list below. If you like @jorgensd, I can open independent issues for each of the items if that makes it easier for you to address them. I also mentioned some items, where I was not 100% sure, but checked them off nevertheless. This is just for transparency reasons.

Problematic checklist items

  • [x] Contribution and authorship: Has the submitting author (@jorgensd) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

@jorgensd there is another contributor in the list of contributors on GitHub, @finsberg. It's hard to judge as an external, whether he should be offered co-authorship here, but as he contributed about 10% of the commits to the repo, please state why he is not listed as a co-author in this paper (only in the Acknowledgements)

  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?

This is currently missing in the README and documentation.

  • [x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).

The authors include several guides in the docs with example code to demonstrate the functionality. It's not real-world analysis, but that's because the software is rather generic, so that's ok.

  • [x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?

The authors provide a test-suite that is implemented via CI :+1: Unfortunately, it is missing in the docs, how to run the test suite locally.

  • [x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?

No, not really. It is not clear to a non-specialist from the summary, what the authors understand under a checkpointing framework, what DOLFINx is all about, even what MPI is should be explained. The summary is using too many technical terms in the summary, that are not obvious for non-specialists.

  • [x] State of the field: Do the authors describe how this software compares to other commonly-used packages?

As I am not in the field, it's hard to judge this information. The authors however provide some review on how this has been implemented in older versions.

Functionality for N-to-M checkpointing was implemented for the old version of DOLFIN by27 (Habera et al., 2018). However, this functionality is not present in the newest version of28 the FEniCS Project (Baratta et al., 2023). The storage principles in the ADIOS4DOLFINx29 are based on the ideas present in this implementation. However, the implementation for30 non-Lagrangian finite element spaces vastly differs, due to the usage of dof-permutations31 (Scroggs et al., 2022). Additionally, all global MPI-calls in the old implementation have been32 reimplemented with scalable MPI-communication using the MPI-3 Neighborhood Collectives33 (MPI-Forum, 2012).

To me, this sounds good enough, but I'd like to leave it to you @danielskatz to checkoff this item, as I am not sure whether I am qualified to evaluate this information.

jorgensd commented 7 months ago

@Chilipp I've opened various pull requests trying to address most of your comments above:

  • [ ] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?

This is currently missing in the README and documentation.

  • [x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).

The authors include several guides in the docs with example code to demonstrate the functionality. It's not real-world analysis, but that's because the software is rather generic, so that's ok.

  • [ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?

The authors provide a test-suite that is implemented via CI 👍 Unfortunately, it is missing in the docs, how to run the test suite locally.

  • [ ] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?

No, not really. It is not clear to a non-specialist from the summary, what the authors understand under a checkpointing framework, what DOLFINx is all about, even what MPI is should be explained. The summary is using too many technical terms in the summary, that are not obvious for non-specialists.

Problematic checklist items

  • [ ] Contribution and authorship: Has the submitting author (@jorgensd) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

@jorgensd there is another contributor in the list of contributors on GitHub, @finsberg. It's hard to judge as an external, whether he should be offered co-authorship here, but as he contributed about 10% of the commits to the repo, please state why he is not listed as a co-author in this paper (only in the Acknowledgements)

I'll let @finsberg add his thoughts here. Prior to submission we did discuss if he should be a co-author or not.

jorgensd commented 7 months ago

As I am not in the field, it's hard to judge this information. The authors however provide some review on how this has been implemented in older versions.

Functionality for N-to-M checkpointing was implemented for the old version of DOLFIN by27 (Habera et al., 2018). However, this functionality is not present in the newest version of28 the FEniCS Project (Baratta et al., 2023). The storage principles in the ADIOS4DOLFINx29 are based on the ideas present in this implementation. However, the implementation for30 non-Lagrangian finite element spaces vastly differs, due to the usage of dof-permutations31 (Scroggs et al., 2022). Additionally, all global MPI-calls in the old implementation have been32 reimplemented with scalable MPI-communication using the MPI-3 Neighborhood Collectives33 (MPI-Forum, 2012).

I also mention how this has been done in Firedrake and reference their latest preprint: https://arxiv.org/pdf/2401.05868.pdf which wrongly states that legacy DOLFIN does not support N-to-M checkpointing (and to some extent DOLFINx, as my framework has been around for a year). However, they do also state that other popular frameworks such as Moose and Deal ii does not have N-to-M checkpointing implemented.

jorgensd commented 7 months ago

@editorialbot generate pdf

editorialbot commented 7 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Chilipp commented 7 months ago

thanks @jorgensd! I updated the review checklist accordingly.

finsberg commented 7 months ago

I'll let @finsberg add his thoughts here. Prior to submission we did discuss if he should be a co-author or not.

Hi @Chilipp. @jorgensd and I discussed co-authorship prior to the submission and decided that it was most appropriate to not include me as a co-author. Although I have contributed about 10% of the commits, I don't think these commits have been critical to the software.

Chilipp commented 7 months ago

Thanks for the clarification @finsberg and @jorgensd :smiley:

Chilipp commented 7 months ago

@danielskatz, from my perspective I am done with my review :smiley: please let me know if there is anything else

danielskatz commented 7 months ago
  • [ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?

As I am not in the field, it's hard to judge this information. The authors however provide some review on how this has been implemented in older versions.

Functionality for N-to-M checkpointing was implemented for the old version of DOLFIN by27 (Habera et al., 2018). However, this functionality is not present in the newest version of28 the FEniCS Project (Baratta et al., 2023). The storage principles in the ADIOS4DOLFINx29 are based on the ideas present in this implementation. However, the implementation for30 non-Lagrangian finite element spaces vastly differs, due to the usage of dof-permutations31 (Scroggs et al., 2022). Additionally, all global MPI-calls in the old implementation have been32 reimplemented with scalable MPI-communication using the MPI-3 Neighborhood Collectives33 (MPI-Forum, 2012).

To me, this sounds good enough, but I'd like to leave it to you @danielskatz to checkoff this item, as I am not sure whether I am qualified to evaluate this information.

If it sounds good enough to you, then please check it off.

jorgensd commented 6 months ago

@gonsie Have you had time to look at the library yet?:)

danielskatz commented 6 months ago

👋 @gonsie - while it's a bit unusual for an author to bug a reviewer (@jorgensd - please don't do this), I see you've checked off some items. Is there anything blocking your review progress currently?

jorgensd commented 6 months ago

👋 @gonsie - while it's a bit unusual for an author to bug a reviewer (@jorgensd - please don't do this), I see you've checked off some items. Is there anything blocking your review progress currently?

Sorry, I’ll not do that again.

gonsie commented 6 months ago

I'm having a lot of trouble with the installation. I'm not sure if it's my (practically nonexistent) docker skills, or poor instructions, but I can't seem to get a full installation.

I've made an issue at jorgensd/adios4dolfinx#106, hopefully with some help I can make progress.

gonsie commented 6 months ago

One comment about the paper: I've always understood the term "N-to-M checkpointing" to indicate that N processes are writing to M files. With the adios4dolfinx package, are users in control of how many files are used for a checkpoint, or is it always a single shared file?

jorgensd commented 6 months ago

One comment about the paper: I've always understood the term "N-to-M checkpointing" to indicate that N processes are writing to M files. With the adios4dolfinx package, are users in control of how many files are used for a checkpoint, or is it always a single shared file?

I am using the N-to-M checkpointing definition from https://arxiv.org/pdf/2401.05868.pdf. The bp format is a format made by the developers of ADIOS2, and uses a folder to encapsulate a single binary file, plus some metadata files (https://adios2.readthedocs.io/en/latest/engines/engines.html#bp5). If you choose to use the backend engine HDF5 you get a single .h5 file.

gonsie commented 6 months ago

@danielskatz my review is complete, everything is checked. thank you.

danielskatz commented 6 months ago

Thanks @gonsie

danielskatz commented 6 months ago

@jorgensd - I'm going to proofread the paper, then will let you know what we need to do next.

danielskatz commented 6 months ago

@editorialbot generate pdf

editorialbot commented 6 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jorgensd commented 6 months ago

DOLFINx just released a 0.8.0 release (yesterday), and it is making its way onto conda (https://github.com/conda-forge/fenics-dolfinx-feedstock/pull/71). Hopefully I can get out a 0.8.1 release of the software that aligns with 0.8.0 of DOLFINx (as it would be easier for a user to read installation notes for the software then).

danielskatz commented 6 months ago

My suggested changes to the paper are in https://github.com/jorgensd/adios4dolfinx/pull/112. Please merge this, or let me know what you disagree with.

danielskatz commented 6 months ago

Then, when you are ready (perhaps with a new release of the software), please:

I can then move forward with accepting the submission.

jorgensd commented 6 months ago

My suggested changes to the paper are in jorgensd/adios4dolfinx#112. Please merge this, or let me know what you disagree with.

All looks good except one citation where i had some suggestions. Thanks for the speedy corrections.

danielskatz commented 6 months ago

My suggested changes to the paper are in jorgensd/adios4dolfinx#112. Please merge this, or let me know what you disagree with.

All looks good except one citation where i had some suggestions. Thanks for the speedy corrections.

I fixed that one now - thanks for catching it

jorgensd commented 6 months ago

@editorialbot generate pdf

editorialbot commented 6 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

danielskatz commented 6 months ago

👋 @jorgensd - note that I'm waiting for you to do the items in https://github.com/openjournals/joss-reviews/issues/6451#issuecomment-2078115312 ...

jorgensd commented 6 months ago

👋 @jorgensd - note that I'm waiting for you to do the items in #6451 (comment) ...

Hi Daniel, I'm currently working on getting a conda build up and running, and would like to do that prior to tagging the repository and making a release. Sorry for the inconvenience. I've made an issue at: https://github.com/jorgensd/adios4dolfinx/issues/113 tracking my progress wrt. this endeavor.

danielskatz commented 6 months ago

that's fine - I just wanted to be clear on where we were

jorgensd commented 6 months ago

@danielskatz I've now released a new version of adios4dolfinx (v0.8.1.post0), https://github.com/jorgensd/adios4dolfinx/releases/tag/v0.8.1.post0 Zenodo DOI: https://doi.org/10.5281/zenodo.11094985

jorgensd commented 6 months ago

@editorialbot generate pdf

editorialbot commented 6 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

danielskatz commented 6 months ago

@editorialbot set v0.8.1.post0 as version

editorialbot commented 6 months ago

Done! version is now v0.8.1.post0