[REVIEW]: PyDGN: a Python Library for Flexible and Reproducible Research on Deep Learning for Graphs

editorialbot commented 11 months ago

Submitting author: !--author-handle-->@diningphil@arfon<!--end-editor-- Reviewers: @idoby, @sepandhaghighi Archive: 10.5281/zenodo.8396373

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/912b1b0ae8bf2ae2d44aaafc50126b74"><img src="https://joss.theoj.org/papers/912b1b0ae8bf2ae2d44aaafc50126b74/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/912b1b0ae8bf2ae2d44aaafc50126b74/status.svg)](https://joss.theoj.org/papers/912b1b0ae8bf2ae2d44aaafc50126b74)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@idoby & @sepandhaghighi, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @arfon know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @idoby

📝 Checklist for @sepandhaghighi

editorialbot commented 11 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 11 months ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.13 s (788.0 files/s, 126413.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          47           2020           3800           7060
YAML                            28            336            273           1139
Markdown                         5            282              0            438
reStructuredText                15            242            449            240
TeX                              1             21              0            151
SVG                              3              2              0             89
Bourne Shell                     2             20             15             47
DOS Batch                        1              8              1             26
make                             1              4              7              9
TOML                             1              0              0              6
-------------------------------------------------------------------------------
SUM:                           104           2935           4545           9205
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 11 months ago

Wordcount for paper.md is 1122

editorialbot commented 11 months ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- None

MISSING DOIs

- 10.1109/72.572108 may be a valid DOI for title: Supervised neural networks for the classification of structures
- 10.1109/tnn.2008.2010350 may be a valid DOI for title: Neural network for graphs: A contextual constructive approach
- 10.1109/msp.2017.2693418 may be a valid DOI for title: Geometric deep learning: going beyond Euclidean data

INVALID DOIs

- None

idoby commented 11 months ago

Review checklist for @idoby

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/diningphil/PyDGN/?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@diningphil) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

editorialbot commented 11 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

arfon commented 11 months ago

@idoby, @sepandhaghighi – This is the review thread for the paper. All of our communications will happen here from now on.

Please read the "Reviewer instructions & questions" in the first comment above. Please create your checklist typing:

@editorialbot generate my checklist

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/5713 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please make a start well ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.

arfon commented 11 months ago

@sepandhaghighi – looks like there's a whitespace character at the start of that command to @editorialbot which is stopping it from responding (a bug we should fix).

Could you retry in a new comment, ensuring the sentence starts with @editorialbot.

sepandhaghighi commented 11 months ago

Review checklist for @sepandhaghighi

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/diningphil/PyDGN/?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@diningphil) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

arfon commented 10 months ago

:wave: @idoby & @sepandhaghighi – just checking in to see how you're getting on with your reviews? It looks like you've both made a start here, do you think you might be able to wrap up your initial reviews in the next week or so so that @diningphil can start responding?

sepandhaghighi commented 10 months ago

👋 @idoby & @sepandhaghighi – just checking in to see how you're getting on with your reviews? It looks like you've both made a start here, do you think you might be able to wrap up your initial reviews in the next week or so so that @diningphil can start responding?

@arfon 👋 I will complete my review in the next few days 💯

idoby commented 10 months ago

👋 @idoby & @sepandhaghighi – just checking in to see how you're getting on with your reviews? It looks like you've both made a start here, do you think you might be able to wrap up your initial reviews in the next week or so so that @diningphil can start responding?

Thanks for the reminder, had forgotten about this. Will comment soon

idoby commented 10 months ago

@diningphil, thanks for submitting this package, it seems like a lot of thought and effort went into it!

A few comments:

Paper:

Please fix the missing DOIs as well as the broken reference to the PyTorch Geometric paper.
The opening sentence of the paper ("The disregard...") is too harsh in my opinion, and could be phrased in a more positive way. I suggest something along the lines of "Evaluation practices in the machine learning (ML) and specifically graph neural network (GNN) communities can often be improved upon considerably [cite here]..."
The paper discusses various ways to use and benefit from PyDGN, but the descriptions are rather abstract and lack motivation. I recommend the authors motivate each feature in the appropriate section in the text and add a small usage example snippet to illustrate how the feature is used to integrated into the user's code.
Perhaps dedicate a paragraph or two to the design considerations undertaken by the authors of the software, if any interesting design decisions or tradeoffs were made. The author of https://github.com/openjournals/joss-reviews/issues/5372 took me up on this and I feel it has made their paper more interesting. If you don't feel there are any software or system design considerations worth including in the text, that is fine too.
Minor things: PyTorch should be stylized as such. In the paper, names such as PyTorch, etc, are written inconsistently.
Consider citing more relevant papers such as GraphGym (https://arxiv.org/abs/2011.08843) and spelling out how PyDGN is different than, or complementary to tools such as GraphGym.

Software

In my opinion, the setup shell script makes too many assumptions about the user's environment, mainly the lack of existing virtual environment already created by venv, conda, etc. The requirement to source a shell script is contrary to standard practice in Python. Please formalize your dependencies in setup.py and pyproject.toml (preferably the latter, since the former is considered a legacy install system) and let the user's chosen package manager (e.g., pip, conda, poetry, etc) handle venv creation, dependency resolution and installation.
Furthermore, please do not force the users of your package to install build, wheel, pytest and black, etc, which are required for PyDGN's development workflow but not required to use PyDGN.
The same goes for Jupyter, which should be an optional dependency or detected at runtime, since not every user wants to bring in the entire Jupyter ecosystem to use PyDGN.

Overall, I think this is very good work! I will be digging deeper into the software itself soon.

diningphil commented 10 months ago

@editorialbot generate pdf

editorialbot commented 10 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

diningphil commented 10 months ago

Dear @idoby,

First of all, thank you for the constructive and positive feedback. We were able to simplify a lot the installation process thanks to your comments. Below you can find our response so that we can continue the discussion.

Paper:

We added DOIs for all journal papers and the broken reference
You are right, we changed the sentence to make it more positive
We did our best to include snippets of code that give the reader an idea of what should be done to carry out the most important steps, namely the data splitting, which corresponds to the choice of an evaluation procedure such as 10-fold cross validation, the definition of a new model, and the definition of a new metric. Because all pieces of the puzzle come together in a way that is transparent to the user, we found it difficult to show other snippets, but we believe that this, combined to the turorial in our documentation, might be enough to understand the overall mechanism. Any further suggestion is welcome. We also added a motivation when missing to the different paragraphs.
We added a new paragraph on key design considerations. We identified two of them, namely the use of configuration files to enable fast prototyping and the use of the publish-subscribe pattern to flexibly adapt components of the training loop without touching the main logic. Thank you very much for the suggestion!
We fixed this, thank you for noticing, altough we cannot see any other inconsistency like that in the text.
We added the relevant citation and discussed how it differs from PyDGN. Please note that we already discussed to some extent how the other libraries mentioned are different from us: they are more focused on the definition of building blocks of graph machine learning models, whereas we try to automatize the experiments while keeping the framework flexible enough for everyday research. These libraries can be used in our framework, in particular Pytorch Geometric.

Software (modified on main branch):

We completely agree and we simplified a lot the installation process, requiring only a pip install pydgn command (please refer to README.md in the main branch). We also specified the dependencies in the toml file and removed the legacy setup files as suggested. Only the strictly required dependencies are now listed. Thank you very much for the help, now the code looks much cleaner.
Handled in the previous point
Handled in the previous point

As a last note, please note that the example usage is shown in the readme file, and in the aforementioned tutorial the user can find an explanation of the configuration files and how to use them to setup a proper experiment. Examples of configuration files can be also found in the examples folder.

Thank you again for the help. We are happy to discuss any more suggestions, if needed, related to the new version of the paper. I'd also like to tag @arfon to show that the discussion is ongoing =).

Best regards, the authors!

diningphil commented 10 months ago

@editorialbot generate pdf

editorialbot commented 10 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

idoby commented 10 months ago

@diningphil I think the intro reads much better and is better motivated now, and the differences between the various existing packages and yours are clearer. The practical use and motivation for the features and design decisions reads well.

Note that something seems to have not rendered well in line 83 (seems like a header didn't render) and that PyDGN is styled inconsistently on line 95.

Regarding the installation procedure: PyDGN should install much more easily now. Please consider not forcing the user to install gpustat, since not all installs have CUDA. The same applies to the wandb dependency: please consider detecting wandb at runtime, not forcing your users to install wandb if they don't use it.

Besides that, I think we're good to go. Please consider updating the paper branch with the latest changes so that when the branch is archived upon acceptance, it would include the changes to the installation procedure and any other enhancement you made to the software.

Good job! 👍

idoby commented 10 months ago

@editorialbot check references

editorialbot commented 10 months ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1109/72.572108 is OK
- 10.1109/TNN.2008.2005605 is OK
- 10.1109/TNN.2008.2010350 is OK
- 10.1109/MSP.2017.2693418 is OK
- 10.1186/s40649-019-0069-y is OK
- 10.1016/j.neunet.2020.06.006 is OK
- 10.1109/TNNLS.2020.2978386 is OK
- 10.1109/TKDE.2020.2981333 is OK
- 10.1109/MCI.2020.3039072 is OK

MISSING DOIs

- None

INVALID DOIs

- None

idoby commented 10 months ago

Hmm, this list doesn't seem to include all of the references in the paper...

diningphil commented 10 months ago

Dear @idoby,

We are happy the changes are satisfactory. In order:

Thanks for spotting the (new) typo and the inconsistency! We fixed them.
We have removed wandb, requests and gpustats from the dependencies as suggestes. If CUDA is used, an exception is now raised for gpustats as it already happened for wandb and the user is required to install the library.
References: most machine learning conference papers do not have DOIs, whereas amongst journals the Journal of Machine Learning Research does not release a DOI. From the reference checker it seems the DOIs provided match the journal papers that we referenced and no missing DOIs have been found among the others. This looks like correct behavior, but it is my first time with JOSS so I could well be wrong.
We updated the paper branch for potential future archival.

Thank you again for your careful review. We remain on stand-by for additional exchanges if required =).

diningphil commented 10 months ago

@editorialbot generate pdf

editorialbot commented 10 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

idoby commented 10 months ago

What? Every ML conference paper is supposed to publish a DOI. For example, the DOI for Gilmer et el. Neural message passing for Quantum chemistry is: https://dl.acm.org/doi/10.5555/3305381.3305512

edit: weird, seems like that DOI is broken...

diningphil commented 10 months ago

I will double check again! For ArXiv only papers that will not be possible for sure though.

Edit: The paper you mentioned was published at ICML 17 on PMLR (http://proceedings.mlr.press/v70/gilmer17a) but I see no DOI when downloading the official bibtex file.

Edit2: The same goes for NeurIPS proceedings (https://papers.nips.cc/paper_files/paper/2022) -- I cannot find a DOI in the official bibtex of the published papers.

Maybe I'm looking in the wrong place, but I never saw a DOI for ICML of NeurIPS I think.. just to mention two popular ones

arfon commented 9 months ago

:wave: @sepandhaghighi – just checking in again here. It looks like you've made a good start on your review but haven't completed it yet. Do you think you might be able to get it done in the next couple of weeks?

sepandhaghighi commented 9 months ago

👋 @sepandhaghighi – just checking in again here. It looks like you've made a good start on your review but haven't completed it yet. Do you think you might be able to get it done in the next couple of weeks?

Hi @arfon 👋 Sorry for my late response. I was a little preoccupied. I will be able to finish my review within the next 2 to 3 days 💯

SH

sepandhaghighi commented 9 months ago

@arfon @diningphil 👋

I apologize for my delayed reply Excellent work 🥇

Only two comments:

I advise incorporating a PR template into your repository
Since it appears that this library is only compatible with Linux, I believe it would be best to discuss it under the installation section (In my opinion, that Ubuntu badge is not clear enough)

SH

diningphil commented 9 months ago

Dear @sepandhaghighi,

Thank you so much for your review! We have implemented the following changes (in both paper and main branches):

Specify in Installation section (Readme and docs) that PyDGN works with linux only
Introduced a basic PR template as you recommended following this link.

If you are happy with our changes, I think we might ask @arfon to take the final decision.

Best regards, the authors

idoby commented 9 months ago

@diningphil What makes it Linux only?

diningphil commented 9 months ago

@diningphil What makes it Linux only?

Hi @idoby, PyDGN has never been tested on Windows, and the two commands to prepare datasets and launch experiments pydgn-train and pydgn-dataset assume a Linux/Unix distro.

Perhaps it would be best to write tested on Linux/Unix only ? What do you think?

idoby commented 9 months ago

@diningphil I just ran your tests and both of the commands you mentioned (taken from the examples in the README file) on the latest macOS (Sonoma) with the latest Python etc. The tests pass with warnings coming from dependencies (pyg etc) and the dataset and train commands also seem to work, albeit training seems to be very slow.

If you feel your tests are comprehensive enough, you can claim that your package works on macOS too. I don't see anything in the code that assumes it is running on Linux specifically, or Windows either, for that matter, but I don't have a Windows system to test on.

diningphil commented 9 months ago

@idoby wow, we really thank you for testing this yourself. We will try to test on Windows and see how it goes. If that is the case, we will remove the Ubuntu badge and then claim that it should work on most systems. We will keep you posted.

Thank you again for the help!!

idoby commented 9 months ago

It's a best practice to set up a CI workflow to test every commit automatically on multiple operating systems and versions of Python. You should be able to find plenty of examples online, for example using GitHub Actions.

It's probably a good idea if you intend to continue developing this package.

Anyway, good luck! This paper can be published as far as I'm concerned. 🥳

diningphil commented 9 months ago

I'll try to set it up directly with GitHub Actions as I did for Linux :)

EDIT: it was actually very simple, and the tests passed for windows-latest, ubuntu-latest, and macos-latest. So @sepandhaghighi I will also remove ubuntu badge because it runs on any system apparently =)

arfon commented 9 months ago

If you are happy with our changes, I think we might ask @arfon to take the final decision.

This is looking good. Thanks @diningphil! @sepandhaghighi – could I ask that you check off any remaining review items in your checklist that are left remaining (assuming you believe the response from the author is sufficient).

sepandhaghighi commented 9 months ago

@sepandhaghighi – could I ask that you check off any remaining review items in your checklist

@arfon Everything looks good to me 💯

So @sepandhaghighi I will also remove ubuntu badge because it runs on any system apparently =)

@diningphil Good news 🔥

Please consider the following:

You should update pyproject.toml, classifiers section and add Operating System :: OS Independent instead of Operating System :: POSIX :: Linux
Including the minimum required version of Python in your document is recommended.

SH

diningphil commented 9 months ago

Please consider the following:

1. You should update `pyproject.toml`, `classifiers` section and add `Operating System :: OS Independent` instead of `Operating System :: POSIX :: Linux`

2. Including the minimum required version of Python in your document is recommended.

Both are done! Thank you for spotting them!

arfon commented 9 months ago

@diningphil – looks like we're very close to being done here. I will circle back here next week, but in the meantime, please give your own paper a final read to check for any potential typos etc.

After that, could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:

The title of the archive is the same as the JOSS paper title
That the authors of the archive are the same as the JOSS paper authors
I can then move forward with accepting the submission.

diningphil commented 9 months ago

@editorialbot generate pdf

editorialbot commented 9 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

diningphil commented 9 months ago

Dear @arfon,

We applied small changes to the article (minor modification in abstract and acknowledgments to both reviewers. The rest looks ok.
PyDGN release 1.5.0 now includes the changes suggested by the reviewers.
This is the DOI generated with Zenodo: https://doi.org/10.5281/zenodo.8396373

Please let me know if anything else is required.

Many thanks!

arfon commented 9 months ago

@editorialbot set v1.5.0 as version

editorialbot commented 9 months ago

Done! version is now v1.5.0

arfon commented 9 months ago

@editorialbot set 10.5281/zenodo.8396373 as archive

editorialbot commented 9 months ago

Done! archive is now 10.5281/zenodo.8396373

arfon commented 9 months ago

@editorialbot recommend-accept

editorialbot commented 9 months ago

Attempting dry run of processing paper acceptance...

openjournals / joss-reviews