[REVIEW]: matbench-genmetrics: A Python library for benchmarking crystal structure generative models using time-based splits of Materials Project structures

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@sgbaird@phibeck<!--end-editor-- Reviewers: @ml-evs, @mkhorton, @jamesrhester Archive: 10.5281/zenodo.10840604

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/7ba5a67474e60ef6927a8635354b8546"><img src="https://joss.theoj.org/papers/7ba5a67474e60ef6927a8635354b8546/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/7ba5a67474e60ef6927a8635354b8546/status.svg)](https://joss.theoj.org/papers/7ba5a67474e60ef6927a8635354b8546)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@ml-evs & @mkhorton & @jamesrhester, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @phibeck know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @ml-evs

📝 Checklist for @jamesrhester

📝 Checklist for @mkhorton

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 1 year ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.08 s (755.2 files/s, 365581.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          26            704           1254           1847
Jupyter Notebook                 5              0          21769           1194
Markdown                        12            208              0            679
TeX                              6             28              0            378
YAML                             5             21             72            209
INI                              1             11              0             83
Dockerfile                       1             15             23             26
make                             1              6              8             15
TOML                             1              1              3              5
JSON                             1              0              0              1
-------------------------------------------------------------------------------
SUM:                            59            994          23129           4437
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 1 year ago

Wordcount for paper.md is 1122

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.26434/chemrxiv-2022-6l4pm is OK
- 10.1038/s41467-019-10030-5 is OK
- 10.21105/joss.04528 is OK
- 10.1021/acs.jcim.8b00839 is OK
- 10.1038/s43588-022-00349-3 is OK
- 10.1038/s41524-020-00406-3 is OK
- 10.1063/1.4812323 is OK
- 10.1016/j.commatsci.2012.10.028 is OK
- 10.1038/s41598-022-08413-8 is OK
- 10.3389/fphar.2020.565644 is OK
- 10.1016/j.matt.2021.11.032 is OK
- 10.1107/S2056989019016244 is OK
- 10.1038/s41586-019-1335-8 is OK
- 10.1002/advs.202100566 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jamesrhester commented 1 year ago

Review checklist for @jamesrhester

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/sparks-baird/matbench-genmetrics?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@sgbaird) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

ml-evs commented 1 year ago

Review checklist for @ml-evs

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/sparks-baird/matbench-genmetrics?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@sgbaird) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

phibeck commented 1 year ago

@jamesrhester, @ml-evs, @mkhorton, thanks again for reviewing! Just checking in on your review. Please let me know if you have any questions about the process. Feel free to create issues in project repository directly or write them down as comments here, but please do link the issues in this review so it's easy to follow for everyone. @mkhorton, please go ahead and create your checklist first using the command @editorialbot generate my checklist.

mkhorton commented 1 year ago

Review checklist for @mkhorton

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/sparks-baird/matbench-genmetrics?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@sgbaird) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

phibeck commented 1 year ago

:wave: @jamesrhester , @mkhorton, please update us on how it's going with your reviews when you find the time

mkhorton commented 1 year ago

Appreciate the reminder @phibeck, thank you -- on my radar, can't believe it's almost been a month already since agreeing to review!

jamesrhester commented 1 year ago

Likewise!

phibeck commented 1 year ago

Thank you all for getting the review started! As you work through your checklists, please feel free to comment and ask questions in this thread. You are encouraged to create issues in the repository directly. When you do, please mention openjournals/joss-reviews#5618 so that it creates a link in this thread and we can keep track of it.

Please let me know if you have any questions or if either of you requires some more time.

mkhorton commented 1 year ago

Hi @phibeck, running through the checklist. Unfortunately I may have a conflict of interest: I have previously been on publications with the second author, Joseph Montoya. We worked together in the same research group until 2018 (i.e. outside the four year window in the COI policy), but the most recent paper we were on together actually only came out in 2021.

danielskatz commented 1 year ago

If the work was done more then 4 years, but the paper appeared later, this is not a conflict for JOSS. The four years is about the collaborative relationship itself.

phibeck commented 1 year ago

Thank you, @danielskatz, for clarifying this question. Sounds like you don't have a COI here, @mkhorton.

ml-evs commented 1 year ago

Just a heads-up (mostly for @phibeck) that I will restart my review on this, and will continue collecting small things in my old issue at https://github.com/sparks-baird/matbench-genmetrics/issues/80 which wasn't previously linked here.

phibeck commented 1 year ago

@jamesrhester, @mkhorton & @ml-evs - could you provide an update on the progress of your review? Thank you!

jamesrhester commented 1 year ago

Getting onto this now. We've just had a big crystallography meeting that took up a lot of my cycles...

jamesrhester commented 1 year ago

As a crystallographer but non ML specialist, I found the "Statement of Need" lacked context. Line 22 in the paper made no sense to me, which is not good for the first sentence. I would therefore like one or two further sentences added at the beginning explaining how ML uses benchmarks, e.g "in ML, the result of a prediction is evaluated using benchmarks, which are then used to adjust the ML weights. Typically, a crystal structure ML model has used benchmarks from...." (which might show I have no idea what I'm talking about). I think this will better help readers to quickly determine whether or not the paper is relevant to them.

Once this is done I'm ready to sign off on my review.

phibeck commented 1 year ago

Great, thanks for the update, @jamesrhester. @sgbaird, feel free to get started working on the comments and issues linked here by @jamesrhester and @ml-evs. Please update us here in this issue about the progress so we can keep track of the changes.

phibeck commented 1 year ago

👋 @sgbaird could you let us know where you stand with responding to the comments of the reviewers?

@mkhorton and @ml-evs, it would be great to get an update from your side as well regarding the remaining points on your checklists. Thank you!

ml-evs commented 1 year ago

Hi @phibeck, I have been continuing to list my comments at https://github.com/sparks-baird/matbench-genmetrics/issues/80. My main concern as raised by these comments is that while "complete" the library does not appear to have been "polished", there are lots of minor errors in the docs and defunct code paths/duplicated template code that seems to be leftover from project initialization (I can only find around 700 lines in total of active code in the main package). Coupled with the fact that there are no examples of "serious"/comparative benchmarks, i.e., benchmarking a real generative algorithm, I am struggling to extrapolate the package performance/functionality from the simple example provided. If my comments in https://github.com/sparks-baird/matbench-genmetrics/issues/80 could be addressed I would have an easier time recommending this for publication, but until then I find it hard to judge.

ml-evs commented 1 year ago

Just to attempt to give a constructive suggestion, since actually performing generative ML is out of scope for JOSS, perhaps a nice self-referential example would be to use the MP time split subpackage to "benchmark" our last n years of progress in materials discovery as a field, i.e., treat the slow march of time as a generative model. With a bit of fudging (depending on how the time split treats duplicates in the ICSD) I think this can test novelty, uniqueness and validity with more interpretable results than the given random model. Now to just resist trying this out myself!

sgbaird commented 1 year ago

All, thank you for working on this!

@jamesrhester thank you for the comment - I will update to clarify.

@ml-evs you raise some great points. I will work on addressing your comments before you dive deeper. I return from an international trip tomorrow, and I'll try to come back to this next week. I really like the idea of the "slow march of time" test 😁

phibeck commented 11 months ago

:wave: @sgbaird did you have time to look into the issues?

:wave: @mkhorton how is your review going? Please let me know if you need more time.

phibeck commented 11 months ago

@sgbaird could you please provide an update when you plan to address the reviewers comments?

sgbaird commented 10 months ago

Hi @phibeck, I transitioned from a research role to an administrative, so I've been very delayed in getting to this, but it's still been on my mind a lot. I will try to find some time soon.

phibeck commented 10 months ago

Hi @sgbaird, okay sounds good, thanks. Keep us updated.

mkhorton commented 10 months ago

@phibeck responding to above, I'd like to request more time if ok

phibeck commented 10 months ago

@mkhorton okay sure, I will set a reminder for 10 days from now

phibeck commented 10 months ago

@editorialbot remind @mkhorton in ten days

editorialbot commented 10 months ago

Reminder set for @mkhorton in ten days

editorialbot commented 10 months ago

:wave: @mkhorton, please update us on how your review is going (this is an automated reminder).

phibeck commented 10 months ago

:wave: Hi @sgbaird could we please get an update regarding the issues?

phibeck commented 9 months ago

Hi @sgbaird, can you please let me know if you are able to address the issues and comments within the next two weeks?

If you are not able to within that time, we will need to place the submission on hold, so that we can address the substantial number of submissions that need attention. Should you not be able to work on this in the foreseeable future, we will need to withdraw it.

sgbaird commented 9 months ago

Hi @phibeck, sorry for such a delay.

@jamesrhester

As a crystallographer but non ML specialist, I found the "Statement of Need" lacked context. Line 22 in the paper made no sense to me, which is not good for the first sentence. I would therefore like one or two further sentences added at the beginning explaining how ML uses benchmarks, e.g "in ML, the result of a prediction is evaluated using benchmarks, which are then used to adjust the ML weights. Typically, a crystal structure ML model has used benchmarks from...." (which might show I have no idea what I'm talking about). I think this will better help readers to quickly determine whether or not the paper is relevant to them.

Once this is done I'm ready to sign off on my review.

I have updated this section in the text. Please let me know if this looks OK to you now!

@ml-evs I have addressed all but one of the points in https://github.com/sparks-baird/matbench-genmetrics/issues/80. @hasan-sayeed is going to help me with testing out your "slow march of time" suggestion in https://github.com/openjournals/joss-reviews/issues/5618#issuecomment-1728516279, which I've made some preparations for.

sgbaird commented 9 months ago

@editorialbot generate pdf

editorialbot commented 9 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jamesrhester commented 9 months ago

I'm happy with the updated text, I will sign off on the review now.

mkhorton commented 9 months ago

I'm happy to sign off on the text too, I'm just checking my remaining review items. This package is motivated by an important need in the community; I am not wholly satisfied that this solution is going to be sufficient as it stands today, but it's a good start and can be improved over time. The package also seems well thought through and structured (my compliments to the authors).

My apologies also for the delay with the review; I will be removing myself from the active reviewer list for a while since I don't want to normalize this sort of wait!

mkhorton commented 9 months ago

To give additional context for my critical comment, it is not clear to me that (a) space group alone is sufficient to assess validity, or (b) what a good baseline distribution should be for space group. As far as I can tell, assessing the validity of a generated structure is an unsolved challenge and an active research need. Using the Materials Project as a baseline is a good start, but we might need to be more creative to find more comprehensive answers, and other people might reasonably prefer to use different baselines.

For this package to take off, I would also perhaps start actively soliciting suggestions for additional metrics to include. The desire to prevent metric hacking is also very important, and I hope to see more robust solutions here. The idea of including ML based pre-filters is useful, but also itself dependent on the quality of the ML filter: it could steer an otherwise good model awry, depending on implementation.

Some other points:

Try the Sphinx sphinx-prompt plugin, to make your examples easier to copy-and-paste

I am actually having trouble trying the example on your docs home page; can I verify it's correct and current? For example, I do not see MPTSMetrics10 available to import from the __init__.py, I believe the example should look like:

from tqdm import tqdm
from matbench_genmetrics.mp_time_split.utils.gen import DummyGenerator
from matbench_genmetrics.core.metrics import MPTSMetrics10, MPTSMetrics100, MPTSMetrics1000, MPTSMetrics10000
mptm = MPTSMetrics10(dummy=True)
for fold in mptm.folds:
train_val_inputs = mptm.get_train_and_val_data(fold)
dg = DummyGenerator()
dg.fit(train_val_inputs)
gen_structures = dg.gen(n=mptm.num_gen)
mptm.evaluate_and_record(fold, gen_structures)

Unfortunately, mptm.evaluate_and_record(fold, gen_structures) seems to return None when I ran this, after printing information on ElementProperty and SiteStatsFingerprint.

Likewise, I struggled to install via pip in a clean conda environment: I managed to install it in the end, but a pure pip install does not seem possible on the machine I tested it on (macOS 13.1, Python 3.9).
Given that MP does sometimes deprecate older materials/structures over time, is there a plan to ensure that benchmarks generated with older versions will be compatible with newer versions (or at least some way to alert the user if not)?

If we can resolve the installation issue and double check and verify the code examples in the docs I'd be happy to sign off.

jamesrhester commented 9 months ago

Regarding the example not working, I raised an issue a while back here

phibeck commented 9 months ago

Thank you @sgbaird for the update, and thanks @jamesrhester and @mkhorton for keeping the momentum going. It sounds like there's a fairly well-defined list of items for @sgbaird to work through before the reviewers recommend acceptance. It would be great if the issues could be addressed within the next three weeks, otherwise please provide an estimate of how much time is needed.

ml-evs commented 9 months ago

I'm now happy with the state of the repository itself and the issues raised at https://github.com/sparks-baird/matbench-genmetrics/issues/80, with the one exception that I think the package still needs a good motivating example (as per my previous suggestion) -- I think this is already being worked on (and also ties in with what Matt (1) has said above). Another (non-blocking) suggestion (perhaps an easier one than using mp_time_split itself) would be to take the recently published Gnome dataset and compare that to MP via these metrics.

Pending this example case, the text is also good with me, modulo two minor comments

Figure 1 caption docs links are not being rendered as links
Since the initial submission, the matbench-discovery package has also been released (with preprint). I think these two packages nicely complement each other, but I can see people getting confused (similar names, similar aims). Perhaps a sentence or two highlighting the differences would be helpful here?

phibeck commented 8 months ago

Thanks @ml-evs for the update! @sgbaird whenever you have a chance to take another look, please do, and let us know if there's anything else stopping you from wrapping things up.

phibeck commented 8 months ago

@sgbaird I tried to contact you via the email address listed in your profile. Could you let us know what the status is? In particular, whether you think you will be able to finish working through the issues raised by the reviewers within the next 4 weeks. Thank you.

sgbaird commented 8 months ago

@phibeck sorry for the radio silence, @hasan-sayeed and I are actively working on the motivating example mentioned. Yes, that timeline is doable for us.

phibeck commented 8 months ago

Thank you for the update @sgbaird. Sounds good, please keep us posted.

hasan-sayeed commented 7 months ago

@ml-evs @sgbaird I've just added the motivating example mentioned above as a notebook file. This script treats materials discovered in the future as generated data and benchmarks the progress of materials discovery over time. It employs five-folds, each fold partitions the dataset into distinct training, validation, and generated sets using mp_time_split subpackage, simulating the incremental advancement of materials science. The result we got is as follows:

{0: {'coverage': 0.03280207561156412,
     'novelty': 0.9571905114899926,
     'uniqueness': 0.9998874668429091,
     'validity': 0.9657295385947532},
 1: {'coverage': 0.032987398072646404,
     'novelty': 0.948295033358043,
     'uniqueness': 0.9999187947547732,
     'validity': 0.9689630506461338},
 2: {'coverage': 0.019644180874722018,
     'novelty': 0.9529280948851001,
     'uniqueness': 0.999984404745629,
     'validity': 0.942820170335924},
 3: {'coverage': 0.02575982209043736,
     'novelty': 0.939770200148258,
     'uniqueness': 0.9999801452488405,
     'validity': 0.8635418248589397}}

openjournals / joss-reviews