openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
725 stars 38 forks source link

[REVIEW]: EspressoDB: A scientific database for managing high-performance computing workflows #2007

Closed whedon closed 4 years ago

whedon commented 4 years ago

Submitting author: @ckoerber (Christopher Körber) Repository: https://github.com/callat-qcd/espressodb Version: v1.1.0 Editor: @gkthiruvathukal Reviewer: @remram44, @ixjlyons Archive: 10.5281/zenodo.3677432

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/d0342f15684b9a464faed7c59784f734"><img src="https://joss.theoj.org/papers/d0342f15684b9a464faed7c59784f734/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/d0342f15684b9a464faed7c59784f734/status.svg)](https://joss.theoj.org/papers/d0342f15684b9a464faed7c59784f734)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@remram44 & @ixjlyons, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @gkthiruvathukal know.

Please try and complete your review in the next two weeks

Review checklist for @remram44

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @ixjlyons

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 4 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @remram44, @ixjlyons it looks like you're currently assigned to review this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 4 years ago
Reference check summary:

OK DOIs

- 10.1109/sc.2018.00054 is OK
- 10.1109/SC.2018.00060 is OK
- 10.1038/s41586-018-0161-8 is OK
- 10.1103/PhysRevLett.121.172501 is OK
- 10.1051/epjconf/201817509007 is OK
- 10.1103/PhysRevD.82.094502 is OK

MISSING DOIs

- https://doi.org/10.1109/sc.2018.00058 may be missing for title: Simulating the weak death of the neutron in a femtoscale universe with near-Exascale computing

INVALID DOIs

- None
whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

remram44 commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

gkthiruvathukal commented 4 years ago

@remram44, @ixjlyons: Just checking on progress with this review.

remram44 commented 4 years ago

@gkthiruvathukal underway, sorry about the delay! I'm confident I can get it done next week.

gkthiruvathukal commented 4 years ago

@remram44 Not a problem! This was intended as a gentle nudge. Thanks for your help!

ixjlyons commented 4 years ago

Same here -- I've made progress but haven't updated my checklist. I should be able to finish in the next week or so as well. Thanks for the reminder.

gkthiruvathukal commented 4 years ago

Thanks for the updates, @ixjlyons and @remram44!

remram44 commented 4 years ago

The software side looks fine to me! There are automated tests, extensive documentation including examples.

One missing item is the "community guidelines" explaining how to contribute, but that's easy to add. (I don't know if you need one, but it's on the JOSS checklist :wink:)

Here are my remarks looking around the documentation and code:

Regarding the paper markup, it looks fine, although you probably don't need to monospace "Django" since it's the official name of the project, not just a package import name. "Leadership computing facilities" should be "Leading computing facilities".


My main concerns are about the motivation for the project. In particular, I am not sure of the value added compared to Django, which provides a lot of the claimed features of EspressoDB, some of which being just re-exported, such as database management, signals, and automatically creating views from models (Django has the "admin" pages).

There is also no related work in the paper, although data management and workflow management are big fields. Some things that come to mind:

ixjlyons commented 4 years ago

I've completed my review. Overall, EspressoDB provides a fairly rich set of features for relatively little programming effort. It seems to fulfill an important role in managing complex datasets while encouraging documentation (by making it simple to generate), which is often overlooked. The documentation of EspressoDB itself is fairly thorough and provides a straightforward path to getting started and moving on to more advanced usage. I think it fits into the scope of JOSS and needs just a few improvements in my opinion.

I ended up filing one issue regarding some development installation instructions in the README and I also made a pull request with some typo/grammar fixes in the docs. Aside from that, the following are minor issues I noticed that I didn't feel warranted actual issues against the repository:

Minor grammatical issues in the paper:

First line of the "Use case" section is a fragment. Consider using something like: "LatteDB, an application of ... calculations and analysis, is currently being..." or "LatteDB is an application of ..."

Also in use case section: "ultimately processed down to hundred of..." -> "hundreds of".

Consider: "...status of these files in real-time (identify corrupt..." -> "...status of these files in real-time to identify corrupt..."

I've so far left a few checklist items un-checked. Here is some rationale for these:

Finally, I have a couple more open-ended thoughts you might consider.

There seems to be a pretty heavy reliance on users coming in with some understanding of Django to do anything nontrivial. This isn't necessarily a bad thing, but it could be stated early in the documentation that this is the case. EspressoDB (probably rightly) doesn't attempt to abstract away from Django to avoid this, so I think users should be informed up front.

A potential issue I see with this framework is that it seems to bring together system administration and data processing in a way that I'm not sure is ideal. Perhaps some explanation of how LatteDB (or a hypothetical example instead) is implemented could be sufficient. I'm left wondering, for example, who manages the system and who uses it to do computational work. If everyone using the system (i.e. writing data processing code) needs know Django and/or web development, the applicability of the framework may be somewhat limited. The paper and/or docs might benefit from some explanation of a reasonable workflow involving multiple team members with different roles (e.g. non-scientific administrator, computation-focused programmers, scientists pulling data to do local analyses, etc.).

ckoerber commented 4 years ago

Hello @gkthiruvathukal, @ixjlyons, and @remram44,

Thank you for your time and the feedback you have provided. We believe that your comments help to improve EsspressoDB. To keep updates transparent, we filed issues for suggested changes and intend to merge them into the new version v1.1.0. The filed issues are collected in a new project on EsspressoDB. We believe that we should be able to finalize the new features by the end of next week.

In the next days, we will also address the points you have made in more detail.

General statements regarding the open checkboxes:

For example

... CalLat creates petabytes of temporary files that are written to the scratch file system, used for subsequent computations and ultimately processed down to hundred of tera-bytes that are saved for analysis. It is essential to track the status of these files in real-time (identify corrupt, missing, or purgeable files).

To address the statement of need, should we be more explicit about how LatteDB (and thus EspressoDB) helps to track files or rather approach it from a more general point of view?

gkthiruvathukal commented 4 years ago

@gkthiruvathukal Just letting @ckoerber and all know that I'm keeping an eye on the thread. It sounds like the feedback from @ixjlyons, and @remram44 has been well received and there is a plan to work on the issues raised during review.

After the issues are addressed, I'll have the reviewers take another look.

I'd like to ask for everyone's help here. I have no reason to doubt that all authors of the software are represented on the paper submission, but can each of you (authors and reviewers) please confirm for me?

"Does the full list of paper authors seem appropriate and complete?"

Yes, I know the checkbox has been checked but am asking you to check once more. I am in the middle of dealing with another submission (not edited by me) where the answer is "no" so I am now checking every one of my editorial assignments to make sure there are no authors--or potential authors--who are not listed. If you can do a brief follow-up, this would be much appreciated.

remram44 commented 4 years ago

I confirm that @ckoerber made major contributions to the software, and that the list of authors match the major contributors of the software.

It seems that @cchang5 goes by "Jason Chang" on his GitHub profile and "Chia Cheng Chang" on the paper, but it looks like this is intended, since he personally committed to the paper itself.

cchang5 commented 4 years ago

@gkthiruvathukal I'd like to ask for everyone's help here. I have no reason to doubt that all authors of the software are represented on the paper submission, but can each of you (authors and reviewers) please confirm for me?

I am confirming that I am an author of the software.

ckoerber commented 4 years ago

I can confirm that @cchang5, @walkloud, and I are the authors of EspressoDB (and LatteDB).

ixjlyons commented 4 years ago

Confirming I've re-checked the author list against the contributors based on git history.

gkthiruvathukal commented 4 years ago

Thanks for all responses!

gkthiruvathukal commented 4 years ago

@ckoerber Just checking on the status to address the review feedback. The next step will be for the reviewers to confirm that the feedback has been addressed to their satisfaction. Then I will be in a position to make my recommendation.

ckoerber commented 4 years ago

Hello @gkthiruvathukal, we anticipate to respond in detail at the beginning of next week.

ckoerber commented 4 years ago

Hello @gkthiruvathukal, @ixjlyons, and @remram44,

We would like to thank you for your feedback and patience. We have posted detailed responses to both referee replies as issues on the EspressoDB repo

Furthermore, we have created a Pull Request which contains the updates to the paper, documentation, and new features we have introduced to address concerns made by the referees.

We intend to merge this branch into master once the second review iteration is finalized. Is this in accordance with the JOSS guidelines?

Feel free to contact us if you have any questions.

Best regards,

@cchang5, @ckoerber, @walkloud

gkthiruvathukal commented 4 years ago

@ckoerber, @cchang5, and @walkloud, thank you for responding to the review feedback.

@ixjlyons and @remram44, please let me now if the team has addressed your feedback in a satisfactory manner. Then I can proceed to the next phase (acceptance).

ixjlyons commented 4 years ago

@whedon generate pdf from branch v1.1.0

whedon commented 4 years ago
Attempting PDF compilation from custom branch v1.1.0. Reticulating splines etc...
whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

ixjlyons commented 4 years ago

The authors have addressed my comments thoroughly. I looked over the updated and rendered paper from the v1.1.0 branch and updated my checklist. I recommend the submission be accepted.

gkthiruvathukal commented 4 years ago

@ckoerber, I think I am ready to move toward acceptance. Can you please do the following? Please just follow up with comments for each item. I will then check off the boxes.

remram44 commented 4 years ago

I completed my checklist after checking out https://github.com/callat-qcd/espressodb/pull/48, thanks!

Note that the list of workflow systems I knew off the top of my head might not be the most relevant systems (as I said, I used to work on one of them, but I've been out of this space for a few years. In particular, I don't think those tools are tailored to HPC. REANA and Galaxy are other names I remember).

Also there are probably academic publications for some systems which might be better citations than the URLs (Taverna has this page for example).

ckoerber commented 4 years ago

Hello @gkthiruvathukal, @ixjlyons, and @remram44,

We are happy that EspressoDB was accepted for publication and would like to thank you for your time and efforts.

The tagged release is v1.1.0 and the zenodo doi is 10.5281/zenodo.3676074 for all versions and 10.5281/zenodo.3677432 for the one associated with the tag. We have verified that the Zenodo metadata matches the JOSS metadata and added ORCIDs.

gkthiruvathukal commented 4 years ago

Thanks, @ckoerber, for the follow up. I will work on the final steps and signal the EICs when done.

gkthiruvathukal commented 4 years ago

@whedon set v1.1.0 as release

whedon commented 4 years ago

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@whedon commands
gkthiruvathukal commented 4 years ago

@whedon commands

whedon commented 4 years ago

Here are some things you can ask me to do:

# List all of Whedon's capabilities
@whedon commands

# Assign a GitHub user as the sole reviewer of this submission
@whedon assign @username as reviewer

# Add a GitHub user to the reviewers of this submission
@whedon add @username as reviewer

# Remove a GitHub user from the reviewers of this submission
@whedon remove @username as reviewer

# List of editor GitHub usernames
@whedon list editors

# List of reviewers together with programming language preferences and domain expertise
@whedon list reviewers

# Change editorial assignment
@whedon assign @username as editor

# Set the software archive DOI at the top of the issue e.g.
@whedon set 10.0000/zenodo.00000 as archive

# Set the software version at the top of the issue e.g.
@whedon set v1.0.1 as version

# Open the review issue
@whedon start review

EDITORIAL TASKS

# Compile the paper
@whedon generate pdf

# Compile the paper from alternative branch
@whedon generate pdf from branch custom-branch-name

# Remind an author or reviewer to return to a review after a
# certain period of time (supported units days and weeks)
@whedon remind @reviewer in 2 weeks

# Ask Whedon to do a  dry run of accepting the paper and depositing with Crossref
@whedon accept

# Ask Whedon to check the references for missing DOIs
@whedon check references

# Ask Whedon to check repository statistics for the submitted software
@whedon check repository

EiC TASKS

# Invite an editor to edit a submission (sending them an email)
@whedon invite @editor as editor

# Reject a paper
@whedon reject

# Withdraw a paper
@whedon withdraw

# Ask Whedon to actually accept the paper and deposit with Crossref
@whedon accept deposit=true
gkthiruvathukal commented 4 years ago

@whedon set v1.1.0 as version

whedon commented 4 years ago

OK. v1.1.0 is the version.

gkthiruvathukal commented 4 years ago

@whedon set 10.5281/zenodo.3677432 as archive

whedon commented 4 years ago

OK. 10.5281/zenodo.3677432 is the archive.

gkthiruvathukal commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

gkthiruvathukal commented 4 years ago

@whedon generate pdf from branch v1.1.0

whedon commented 4 years ago
Attempting PDF compilation from custom branch v1.1.0. Reticulating splines etc...
whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

gkthiruvathukal commented 4 years ago

@openjournals/joss-eics I'm recommending this paper for acceptance.

kyleniemeyer commented 4 years ago

OK, everything looks good to me!

kyleniemeyer commented 4 years ago

@whedon accept

whedon commented 4 years ago
Attempting dry run of processing paper acceptance...
whedon commented 4 years ago
Reference check summary:

OK DOIs

- 10.1109/sc.2018.00054 is OK
- 10.1109/SC.2018.00060 is OK
- 10.1038/s41586-018-0161-8 is OK
- 10.1103/PhysRevLett.121.172501 is OK
- 10.1051/epjconf/201817509007 is OK
- 10.1103/PhysRevD.82.094502 is OK
- 10.1093/nar/gkt328 is OK

MISSING DOIs

- https://doi.org/10.1109/sc.2018.00058 may be missing for title: Simulating the weak death of the neutron in a femtoscale universe with near-Exascale computing

INVALID DOIs

- None
whedon commented 4 years ago

Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/1330

If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/1330, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true