openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
695 stars 36 forks source link

[REVIEW]: NOMAD: A distributed web-based platform for managing materials science research data #5388

Closed editorialbot closed 8 months ago

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@markus1978<!--end-author-handle-- (Markus Scheidgen) Repository: https://github.com/nomad-coe/nomad Branch with paper.md (empty if default branch): paper Version: v1.2.1 Editor: !--editor-->@zhubonan<!--end-editor-- Reviewers: @arosen93, @berquist, @sgbaird Archive: 10.5281/zenodo.8366163

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/5485aee97d291f2b545116d21c3b76c5"><img src="https://joss.theoj.org/papers/5485aee97d291f2b545116d21c3b76c5/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/5485aee97d291f2b545116d21c3b76c5/status.svg)](https://joss.theoj.org/papers/5485aee97d291f2b545116d21c3b76c5)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@arosen93 & @berquist & @sgbaird, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @zhubonan know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @berquist

📝 Checklist for @sgbaird

📝 Checklist for @arosen93

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 year ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.01 s (565.5 files/s, 48066.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Markdown                         2             19              0            191
TeX                              1             15              0            170
YAML                             1              1              4             18
Bourne Shell                     1              1              0              6
-------------------------------------------------------------------------------
SUM:                             5             36              4            385
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 1 year ago

Wordcount for paper.md is 1279

editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1038/sdata.2016.18 is OK
- 10.1038/s41586-022-04501-x is OK
- 10.48550/arXiv.2205.14774 is OK
- 10.1038/s41597-021-00974-z is OK
- 10.1063/1.4812323 is OK
- 10.1007/s11837-013-0755-4 is OK
- 10.1016/j.commatsci.2012.02.005 is OK
- 10.1038/s41597-020-00638-4 is OK
- 10.1038/s41524-018-0107-6 is OK
- 10.1088/1361-648X/aa680e is OK
- 10.1107/S1600576714027575 is OK
- 10.1016/j.commatsci.2012.10.028 is OK
- 10.48550/arXiv.1805.05039 is OK
- 10.1038/sdata.2018.53 is OK
- 10.1038/s41524-017-0048-5 is OK
- 10.1038/s41524-022-00935-z is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 1 year ago

Failed to discover a valid open source license

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

zhubonan commented 1 year ago

Hi @arosen93, @berquist, @sgbaird - thanks for helping out reviewing this package. Please generate the checklists using

@editorialbot generate my checklist

We aim for reviews to be completed within about 2-4 weeks, but please do let me know if any of you require some more time.

We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

sgbaird commented 1 year ago

Review checklist for @sgbaird

I'm adding some comments here for my own convenience.

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Perhaps the bulleted lists of key features from https://nomad-lab.eu/nomad-lab/ (see image above) could be incorporated as three separate lists in the software paper.

berquist commented 1 year ago

Review checklist for @berquist

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

zhubonan commented 1 year ago

Hi @sgbaird @berquist and @arosen93 how is going reviewing the package? I can see @arosen93 opened this issue here: https://github.com/nomad-coe/nomad/issues/63.

BTW @sgbaird @berquist if you have comments on the package please also open issues on their GitHub page and mention this issues ticket, so they are linked together.

Andrew-S-Rosen commented 1 year ago

Thanks for the ping! I have it on my schedule to dig deeper into the code + paper this week :) Thanks for the patience, everyone!

berquist commented 1 year ago

I've read the paper and believe that NOMAD belongs in JOSS. I'm still distilling my notes on the (website) documentation and haven't tried installing/testing/running other than building the Docker image.

My current concerns are about the author list. How do I raise those concerns?

zhubonan commented 1 year ago

I've read the paper and believe that NOMAD belongs in JOSS. I'm still distilling my notes on the (website) documentation and haven't tried installing/testing/running other than building the Docker image.

👍

My current concerns are about the author list. How do I raise those concerns?

You can raise any concern here in this issue ticket. Note, that JOSS does allow authors other than the code contributors in the list.

sgbaird commented 1 year ago

I have it on my calendar and plan to finish my review by May 25 (graduation, moving, new job playing into it).

Andrew-S-Rosen commented 1 year ago

@zhubonan --- can you help with having @editorialbot make me a checklist?

zhubonan commented 1 year ago

Just type

@editorialbot generate my checklist

Then press the "Comment" button.

This should be able to trigger the editorial bot to generate one for you.

Andrew-S-Rosen commented 1 year ago

Review checklist for @arosen93

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Andrew-S-Rosen commented 1 year ago

@markus1978 and @zhubonan:

Overview

I have finished my review of the code and paper. Overall, it is clear that the NOMAD code is well suited for JOSS and will be of significant interest to the computational materials science community at large. I am personally very excited about the code and am looking forward to use it in my work.

I have no comments about the paper, which is clearly written.

In terms of the checklist items, I have checked off most of them (see below for further details). However, I still think there are a few things that are worth improving on prior to acceptance of this work in JOSS. In general, there is a ton of code, and from a user or contributor standpoint, it's not immediately clear where to look for what in many cases. That accessibility for new users/contributors is the area that I think can be improved the most. It would be a shame if this enormous amount of work didn't end up being adopted outside the inner NOMAD sphere of users simply because of its scale.

Documentation

Navigating the Codebase

Overall, I find the codebase to be a bit challenging to navigate. There is the main nomad repository (which is mirrored on GitLab and GitHub) and then there's the many NOMAD-hosted packages that nomad relies on. For instance, as a user wanting to use the parsers, it's not clear to me based on the GitHub repo where to begin. Do I use the nomad repo? The atomistic-parsers? The electronic-parsers? Some conglomeration thereof? I ultimately figured it out but not without a lot of digging. There should be a clearer map of the various repos in the NOMAD-CoE workspace to help contributors and users.

API Documentation

The documentation for NOMAD itself is very clear, but I was left with many questions about how to use the nomad Python tools. There is a Python section in the documentation, but it's somewhat minimal given how large the codebase is as a whole. For instance, ideally I would like to see more detailed information about the various APIs, such as:

In general, I feel like there is a ton of code here but only a small fraction of it is made accessible to the user (especially prospective developers or code contributors) unless I am missing a section of the documentation.

Installation and Dependencies

Splitting up the Dependencies

From a usability standpoint, it's very difficult to incorporate the NOMAD suite into existing codebases due to the large number of dependencies, several of which are not necessarily actively maintained (and therefore will likely become increasingly difficult to incorporat with newer versions of Python and secondary dependencies). In particular, I can see many use cases where someone might want to use the parsing features of NOMAD but aren't necessarily looking to parse every plausible code supported by NOMAD. In this case, it would be nice to be able to install only the nomad-lab[parsing] dependencies that are necessary for the specific code(s) of interest. For instance, asr is only needed for Atomic Simulation Recipes, which most people aren't going to need. It would be very helpful and increase the usability of the code if additional "extras" could be provided that would break the parsing dependencies down into only those needed for either specific codes or types of codes.

Installation Woes

On the topic of dependencies, although I checked of "Does installation proceed as outlined," that is only strictly true once the issue I raised (https://github.com/nomad-coe/nomad/issues/63) is resolved.

Unit Tests

For Developers

There are unit tests that are run on the GitLab mirror, but they aren't run on the GitHub repo where users can contribute PRs. So, from a contributor perspective, it's not clear from the README or documentation how to run the unit tests locally to test out code. Having to rely on monitoring the GitLab repo for test failures is a bit cumbersome for the user. Since I see a tests folder on GitHub, is it as simple as just doing pytest in that folder? If so, that should be added to the docs.

sgbaird commented 1 year ago

Just started tonight. I should have mine within a couple days. I already thought highly of NOMAD, and I'm more impressed as I learn more. Looking forward to this!

markus1978 commented 1 year ago

@arosen93 Thank you very much for taking the time to review NOMAD and for all the very helpful feedback. We are already getting loads of ideas from this on how to improve the addressed points. I will reply with a longer response and an overview about how we are tackling these points, once all the reviews are in.

sgbaird commented 1 year ago

Overview

NOMAD is an excellent platform with a convenient web app user face, extensive documentation and tutorials, a wide range of features, and a Python package. While NOMAD hosts its own large database of materials, it also provides setup instructions for an isolated solution when someone wishes to implement their own private/secure database. In the past, I had a good experience using NOMAD to create a dataset of chemical formulas, larger than what is offered by a database like Materials Project, yet more targeted than a random generation of formulas. Our group used NOMAD recently to screen for novel, high-performing superconductors. I think the name NOvely MAterials Discovery (NOMAD) is certainly fitting. Using NOMAD through the API was a positive experience, and the developers were timely in offering support. It also seems like one of the best venues to submit candidate crystal structures from generative modeling efforts. These are some specific examples, and there are many more use-cases. NOMAD is a large benefit to a broad audience.

As an aside, I wanted to point out the following statement that stuck out to me:

Materials scientists require effective solutions for managing their research data, but they cannot and should not develop their own individual solutions.

NOMAD is a tool that materials scientists can use to effectively manage their research data, and it lessens the gap to becoming a materials data scientist.

Installation

Functionality

Ideally, I'd like to test out the functionality for "publish", "explore", and "analyze" (as mentioned at https://nomad-lab.eu/nomad-lab/). image

I will try to work through the tutorials to ensure each of these work, though I'd like to get the installation issue mentioned above figured out prior to this.

Documentation

Echoing @ arosen93, the requirements are lengthy and strict in terms of pinned dependencies.

It's great that the installation instructions are well documented, the package is available to download via PyPI, and the web app is easy to use.

The community guidelines would benefit from additional details. For ideas, here's an example of a PyScaffold default template for CONTRIBUTING.md, and here's an example of how I modified it for another project (incidentally, a published JOSS manuscript).

Like @arosen93 mentioned, auto-generated API docs for the Python package would be nice.

Agreed with @arosen93. I can see unit tests at, e.g., https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/pipelines/168086, but users should be able to run unit tests locally. It should also be explained why unit tests are run on Gitlab and not on GitHub (e.g., better/faster resources on GitLab through the sponsoring organization), preferably in the same location as the instructions for the unit tests.

Software paper

This public NOMAD service contains over 12 million individual materials science simulations and an increasing number of entries describing materials science experiments. NOMAD is publicly available since 2014 and includes data from over 500 international authors.

If available, it would be nice to see an order of magnitude estimate of the number of experimental entries (e.g., over NUM entries describing materials science experiments and counting).

I learned about the NOMAD OASIS feature when reading the manuscript. Great feature!

NOMAD OASIS can be freely used as per our OSI license ...

... our Open Source Initiative (OSI) license ...

NOMAD implements OPTIMADE and is an active part of the OPTIMADE consortium.

Could this be confused to mean that NOMAD is the "parent organization" of OPTIMADE? Perhaps "NOMAD conforms to OPTIMADE standards and is accessible via their API. Additionally, NOMAD is an active participant of the OPTIMADE consortium."

Other materials science databases (and the respective software) focus on only publishing data that were produced by the group behind the database itself. Typical examples are databases of high-throughput simulations that try to systematically explore theoretical materials. Three of the larger databases of this kind are the Materials Project (Jain et al., 2013), ...

Two examples that come to mind with an interface for contributing user data (not internally generated by the group). The following is copied from https://github.com/sparks-baird/self-driving-lab-demo/discussions/117#discussioncomment-6019686:

Is it also worth mentioning non-materials-science-specific options, like MongoDB? Even if not incorporated into the manuscript, I'm curious to hear your opinion.

While not a database that supports user contributions, a notable database is MPDS, which arguably contains the largest collection of experimentally determined crystal structures and materials properties extracted from literature. However, the data is kept private through a subscription paywall. I think it would be worth mentioning MPDS.

Please consider adding bulleted lists of key features from the three categories in https://nomad-lab.eu/nomad-lab/ (see image above) as some type of list in the manuscript (concatenated list, three separate lists, table, etc.).

markus1978 commented 1 year ago

@sgbaird Thank you for your extensive, insightful comments, and issue reports. I will come up with a longer response soon and give a plan on how to address the raised issues.

zhubonan commented 1 year ago

Hope everyone is well and not melted by the summer heat (yet) 🥵 Just want to give ping here! Thanks @arosen93 @sgbaird Thanks for reviewing and posting the comments.

@berquist I can see there are still some checked items in the checklist? Do you still have concerns about the author list? Note that we do allow non code-contributor to be on the paper, which is not a problem.

@markus1978 how is it going addressing the comments?

markus1978 commented 1 year ago

@zhubonan Thanks for the pin and thanks for the reviews again.

I did not come up with the "big reply" yet, as I was under the impression that @berquist does still need sometime:

I've read the paper and believe that NOMAD belongs in JOSS. I'm still distilling my notes on the (website) documentation and haven't tried installing/testing/running other than building the Docker image.

But be assured that we are already working on the raised issues. To keep it brief for now, these are main things we are working on:

With these two things done, we should have addressed most of what @arosen93 and @sgbaird raised. I will prepare the formal big reply and address the issues in detail, once @berquist signaled that he finished his review.

berquist commented 1 year ago

I apologize for sitting on what I've done so far and am working on tidying up now. I probably won't be able to cover everything the way I'd want, but have focused mostly on the Docker things which no one else has brought up yet.

Do you still have concerns about the author list? Note that we do allow non code-contributor to be on the paper, which is not a problem.

Yes; since non-code contributors are allowed, there are still a few names that seem to have contributed heavily that are not on the list at all.

markus1978 commented 1 year ago

Yes; since non-code contributors are allowed, there are still a few names that seem to have contributed heavily that are not on the list at all.

Of course I am open to address any concerns here. If this is more comfortable, I assume the review process would also allow a private emails thread in this case?

zhubonan commented 1 year ago

@markus1978 We would encourage open discussion if possible. But if there are difficulties, I am also OK with private email on this matter.

markus1978 commented 12 months ago

Thank you again for the reviews and all your efforts. Below I aggregate your points and provide a respective response for each of them. 

Open bug reports

You raised these issues:

We released a new version of our Python package nomad-lab and provided an update to the documentation. We now have a CI/CD job that tests the installation and basic usage of the package. We also confirmed that it can be installed on google colab (example notebook). Furthermore, the new version uses version ranges on its dependencies decreasing the chance of conflicts. We also restructured the imports and "extra" dependencies: now a basic pip install will provide everything needed for most functionalities (ArchiveQuery and most parsers).

On the number of dependencies. Unfortunately, splitting them up and pushing them into the parsers is harder than one might think. Most of the dependencies that we require, even for the basic installation, are used by the definition of our shared schema already and not just by the parsers. This is a challenge for us in general, especially since the project is still growing.  We now introduced a plugin mechanism to mediate this problem. We will see in the upcoming months, when we apply this mechanism to our parsers and schemas, if and how we can manage dependencies better. We did the little we could for now. I went through the imports and dependencies and optimised what was possible. We also lowered the requirements on the dependency versions.

Documentation

The reviewers pointed out that:

In response to these points, we fully restructured the documentation ground up. It now contains categorised how-to guides (on data management, programming interfaces, development, etc.). 

This includes a new guide on how to navigate the code. This guide explains the projects and organisations. It also gives a few notes on the most important modules and directories. It also shows where to find the documentation. 

It also includes a new more detailed contributing guide. It is more structured after your suggestions. Here we also break down the GitHub vs GitLab topic, how the flow is for external contributors, how to test locally, etc. 

Our developers guide (also linked from the contributing guide) explains how to run backend and frontend tests locally, how to configure your IDE (or what commands to run) to enforce style checks, linting etc. 

There are also guides on installing nomad-lab, running parsers from command line or code, how to use the REST API with examples (in addition to the API dashboards). However, we do not have a full reference of all modules, classes, and functions. This is a conscious decision. We had this in the beginning, but forcing your documentation structure to fit the code structure was very confusing, and dedicated tutorials/guides proved to be more useful. There are docstring throughout the code, but this is intended for navigating the code and IDE intelli-sense. However, we added a reference section to the documentation with all config settings, schema annotations, available parsers and plugins, and all cli commands and arguments. 

The respective parts of the documentation are linked from the README.md, CONTRIBUTING.md, and of course, on our homepage. We also managed to work with google correctly and a search like "how to run nomad-lab unit tests" should get you to the right docs. 

The paper itself

We changed the line (in the paper on github) about OPTIMADE to:

OPTIMADE [@andersen:2021] is an API specification for materials science databases. NOMAD provides an implementation of the OPTIMADE specification and is an active part of the OPTIMADE consortium.

Currently there are ~45k entries from experiments in NOMAD representing a solar-cell and electron loss spectroscopy database. We expect that the specific number and magnitude of experiment entries is likely to change as there are a few other experimental databases that are currently developed and integrated. Also most of the non theory work is happening in NOMAD Oasis installations at (also an increasing number of) groups and research institutes. Currently we have ~20 of these installations.

I am not sure, if we can add more information to the paper due to size constraints (we a ~1000 words of the 250-1000 words JOSS margin)? I agree that many other systems could have been mentioned. Here are our thoughts on the selection.

We focused on materials science-specific systems. Of course, there are tons of other solutions that are interesting. We tried to have at least one example in each materials science-specific category: high throughput database, experimental database, workflow system, data standard, and programming library.

Regarding the comments about referencing other centralized databases and infrastructure where users can contribute materials science-related data, we would prefer not to extend more given the large and growing amount of potential candidates for this list. Beyond the MPContribs and Foundry examples highlighted by @sgbaird, one could extend the list to many more like the ICSD, the Chemotion Archive, or the refractiveindex.info to give some examples. It would feel unfair to include some and not others and we feel it would possibly be better addressed in a topical review paper, with the right scope and without being bound to the word limit recommended by JOSS.

We also opted for leaving out databases in the SQL/mongodb sense. Those are not specific to research data management (RDM) or any specific field, let alone materials science. Of course they are often the underpinning of systems like ours (we are using mongodb and elasticsearch under the hood) and you might use them to implement an RDM solution, but we focused on end-user (researcher) focused examples in the list of references.

MPDS is a nice database (based on paper published data, The Pauling file). But only a specific part of the data is open, the rest is commercialised. Of course, this is not really a criterion for not being mentioned, but the other DBs referenced are all open. We could mention so many more databases, these are just examples. Btw. Evgeny Blokhin is/was involved with this and he is active in OPTIMADE as well. 

Regarding the question of when and how to connect databases: in principle, we are very open to collaborations and integrating with other systems. It all comes down to time and resources. For example there are the mentioned collaborations with AFLOW, OQMD, Materials Project (MP). Ideally,common interfaces are used (e.g. same file formats). This is why we are active in OPTIMADE. Another strategy for us in the future will be representing NOMAD as linked data (semantic web, RDF, etc.). In principle, users can add references to their data. We could think about formalizing relationships to MP-ids (or entries in other databases). This could help to integrate with platforms like MPContribs and foster more links between data.

Conclusions

In conclusion, we hope we managed to address all your major points, particularly the installation and documentation problems, which have had many significant changes and improvements. With this, we hope we have also addressed all the JOSS-review-related issues. Even if our job does not fully finish here, we hope that these major changes make our code in good shape to be accepted for publication in JOSS.  Your feedback helped us very much in shaping the new documentation and pushing us towards a more usable Python package, which without a doubt, will keep on improving organically.

Andrew-S-Rosen commented 12 months ago

@markus1978: Thank you and the team for this very thorough response and the absolutely fantastic changes made to the code and documentation. It is much easier to navigate now, and my hope is that this will greatly increase the impact of the hard work you all have put into this. I am delighted to say that all my concerns have been addressed, and I wholeheartedly recommend this work for acceptance in JOSS. I look forward to using NOMAD in my future research group (including OASIS!), and I thank you for the opportunity to review this nice work.

CC @zhubonan.

zhubonan commented 11 months ago

Pinging @sgbaird @berquist - Please see the replies to the comments above and let me know if you have any further concerns about the submission. Thanks!

Please also update your checklists which helps the process.

sgbaird commented 11 months ago

Hi @zhubonan, thanks for the reminder. I'll try to take a look by the end of the weekend!

zhubonan commented 11 months ago

@sgbaird How is it going, any update? Thanks.

sgbaird commented 11 months ago

Hi thanks, for the ping. I am satisfied with the updates and recommend publication. Well done! 👏 I'm excited to continue using NOMAD in the future 😊

berquist commented 11 months ago

Again, apologies for being slow. I don't have a good excuse, but it was not the fault of the JOSS process, which has been painless. In this spirit of not letting the perfect be the enemy of the good...

Review

Paper

The paper itself is well-written, and while there might be some grammar issues or awkward wording in a handful of places, the intent is clear. The referenced literature touches upon the main work in the same space that I'm familiar with and touches upon the conceptual difference, but makes it sound like only the only users of those group's software infrastructure (not to be confused with the databases containing calculation results). This is odd because they explicitly mention pymatgen as a dependency, and pymatgen is originally from the Materials Project. This might be one of the aforementioned wording issues. There are no specific corrections I would make unless further clarification is necessary.

While there are references to other databases, except for the last paragraph, there are no technical comparisons to the frameworks behind these databases. That may not be where NOMAD shines, which to me looks like the FAIR support, the web frontend, and the ease of hosting.

My principal concern is over the author list. https://github.com/openjournals/joss-reviews/issues/5388#issuecomment-1548937979 did mention that JOSS allows authors other than code contributors in the list (which I don't recall seeing codified anywhere), and that does alleviate some concerns. Going by the GitHub contributor ranking, while an imperfect metric, shows that there are handful of people with code contributions, some appearing substantial, that are not in the list at all. I have not looked at the GitLab instance, and my notes may be stale now (from 2023-04-22), but it isn't clear why these people are not authors when they're high on the GH contributor list:

and these authors in the repository AUTHORS:

and it is not clear why these people are not equal authors when they are high on the GH contributor list:

Functionality

The first place I usually look for how to build any piece of software is the CI definition, since if tests are passing, it's more likely to work than sometimes outdated documentation. I extracted this from .gitlab-ci.yaml and had no problems building the image(s):

#!/bin/bash

set -euo pipefail

CI_REGISTRY_IMAGE=nomad
DOCKER_TAG=1.1.8

docker build -t ${CI_REGISTRY_IMAGE}:${DOCKER_TAG} .
docker build --target dev_node -t ${CI_REGISTRY_IMAGE}/dev_node:${DOCKER_TAG} .
docker build --target dev_python -t ${CI_REGISTRY_IMAGE}/dev_python:${DOCKER_TAG} .

Having a Docker image really serves as a nice standard now that Python packaging is so difficult to deal with. The Dockerfile itself follows good practices by not only having useful comments and tidying up after certain steps, but performing a multi-stage build with multiple available tags, and the final images having contents not owned as root. With not much work it should even be a foundation for a development environment using devcontainers.

It's not a requirement, but unfortunately replacing Docker with Podman doesn't quite work:

[6/6] STEP 4/15: COPY --chown=nomad:1000 scripts/run.sh .
Error: building at STEP "COPY --chown=nomad:1000 scripts/run.sh .": looking up UID/GID for "nomad:1000": determining run uid: user: unknown user error looking up user "nomad"

Being able to build and run with Podman and/or Singularity/Apptainer could open up NOMAD containers to more permissions-restrained environments. Apptainer can at least ingest the already-built image (apptainer build nomad-fair-latest.sif docker://gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair:latest).

Going back to look at the documentation, it covers exactly this, which is a good sign.

Under the OASIS, in addition to "Download our basic configuration files", there could be a link to the identical docker-compose infrastructure in the Git repository. The ideal would be if MkDocs had functionality similar to Doxygen where you can put code comments in files that have a named anchor or tag around a span of text which the doc build tool will read and then insert that block directly into the docs, so that the docs never fall out of data with regards to the code. (I don't know MkDocs well so it may or may not be possible.)

Deploying goes exactly as stated in Operating an OASIS. A smaller Docker-based Digital Ocean droplet with the Docker UID changed and port 80 opened in the firewall had me uploading data in minutes.

Regarding CI, the number of GitLab job definitions are considerable but have reasonably descriptive names. It also looks like not much remains that could be abstracted into scripts which do not depend on the CI platform, which is a common trap. Having scripts shared between CI and local development aids consistency between the two environments. The scripts themselves mostly handle egregious errors (set -e) but they could use some high-level documentation.

The dedicated ./scripts/setup_dev_env.sh unfortunately failed for me, and highlights the growing need for something such as Docker- or Spack-based development environments, particularly for Python-based or mixed-language projects. Similarly, I appreciate not being tied to conda in any way.

Documentation

Unfortunately, the GitHub repository is only a mirror for the self-hosted repository. This is understandable, but is confusing for contributors since GitHub--GitLab integration is not perfect. At least this section of the README is right at the top and succinctly explains exactly how contributing works.

Along with the Docker-relevant documentation mentioned above, the general quality of the developer documentation is high. Having "How to do X" or "How to write Y" pages is helpful for onboarding when projects either grow large or are highly domain specific. "How to write a parser" even has a worked example for a hypothetical output. An additional thing that some might find sloppy or distracting but I find informative, at least as a developer, is that the TODOs for documentation are simply left directly on the rendered pages. It would be interesting to know (not in the paper) if any issues or requests have been made as a result.

Conclusions

I completely support NOMAD's publication in JOSS.

zhubonan commented 10 months ago

Thanks for the review report @berquist ! If my understanding is correct, the overall view is positive, but there are some minor issues that should be addressed.

As for the authorship concern, I think for a large open source projects there can be many contributors (such as fixing bugs), and the changes in the lines does not always reflect the work done (for example, running a code-formatter would result in many line changes). So I think it is OK that the author list does not include all contributors. PS: I don't seem to be able to find the two authors: Cuauhtemoc Salazar and Daniel Speckhard?

@markus1978 Could you please address comments raised?

markus1978 commented 10 months ago

@berquist Thanks also from me for the review and all the efforts put into it. These comments really help us to improve.

In respond to the authors list.

Some authors are placed lower in the GH list (or not there at all), because their contributions are in submodules or not in git at all (design docs, workshops, meetings, etc.). We honestly tried to reflect contributions to NOMAD accurately, even though some choices and the order might not appear as objective as commit metrics. I hope this levitates the concerns about the authors list and is also compliant with JOSS's author policy?

On the conceptual difference to other databases. Maybe it would be clearer if we change

Other materials science databases (and the respective software) focus on only publishing data that were produced by the group behind the database itself.

into

Other materials science databases (and the respective software) focus on publishing data that were produced with a specific framework and carefully curated by the group behind the database.

I believe this would also cover Aiida or Material Project contribution (e.g. via MPContribsAPI).

On the detailed comparison with other databases/frameworks. A meaningful comparison on the technical and framework differences would be hard to fit into the constrains of a JOSS paper. I agree that it could be a very interesting publication on its own though.

There has been some work on the setup_dev_env.sh issue. I hope this addresses it sufficiently. As you mentioned, it is really hard to get everything on all platforms outside the docker environment.

We are in the process of moving the central NOMAD installation to a new cloud, i.e. a "more permissions-restrained environment". This will force us to address podman/sigularity issues somewhere in the near future.

Explaining certain NOMAD features to users, partners, and other parties constantly shows new wholes in the documentation. We put them as public todos to also put some pressure on our selfes to fill them soon.

I hope, I could address everything sufficiently. Please let me know, if I missed something. If you (@zhubonan) agree, I will make the mentioned changes to the paper and let you know.

zhubonan commented 10 months ago

Thanks for the detailed reply. I agree with the changes proposed 🚀 . @markus1978 @berquist are you happy with the reply? If you have any further comments, please let me know. Thanks.

berquist commented 10 months ago

I hope this levitates the concerns about the authors list and is also compliant with JOSS's author policy?

Yes, it does. Thank you for examining this. All my other comments are addressed as well, and I'm happy with everything.

markus1978 commented 10 months ago

I made the discussed changes to the paper (https://github.com/nomad-coe/nomad/commit/d734515f71258a1d8f2dcf3b8e0fdac28c6d8092).

I wanted to thank you all the reviewers and the editor again for all the work. Thank you very much for everything.

@zhubonan What are the next steps?

zhubonan commented 10 months ago

@editorialbot generate pdf

zhubonan commented 10 months ago

@editorialbot generate pdf

editorialbot commented 10 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

zhubonan commented 10 months ago

Post-Review Checklist for Editor and Authors

Additional Author Tasks After Review is Complete

Editor Tasks Prior to Acceptance

zhubonan commented 10 months ago

@markus1978 Here are some additional tasks to be done before we recommend accept. The most important thing is to proof the paper and make a release of the code, archive that version so it has a DOI, then post the version number and the DOI here.DOI.

zhubonan commented 10 months ago

@markus1978 I have one small comment:

Materials scientists require effective solutions for managing their research data, but they cannot and should not develop their own individual solution

Sounds a bit extreme to me, as there are developers who are also materials scientists, perhaps changing it to:

they should not have to to develop their own individual solution.

markus1978 commented 10 months ago

I agree and added the change.

It will take me some days to do the final post review tasks. I try to be as quick as possible.

markus1978 commented 9 months ago
markus1978 commented 9 months ago

@zhubonan this should complete the post-review checklist for authors. However, I cannot set the checkboxes myself. I guess this is your task?

zhubonan commented 9 months ago

yep, I will do a final check.

zhubonan commented 9 months ago

@editorialbot generate pdf