openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
710 stars 38 forks source link

[REVIEW]: PyGModels: A Python package for exploring Probabilistic Graphical Models with Graph Theoretical Structures #3115

Closed whedon closed 3 years ago

whedon commented 3 years ago

Submitting author: @D-K-E (D. Kaan Eraslan) Repository: https://github.com/D-K-E/graphical-models/ Version: v0.1.0 Editor: @dfm Reviewer: @eigenfoo, @ankurankan Archive: 10.5281/zenodo.4751740

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/1f3fb76e625510f42fb42602d5679e15"><img src="https://joss.theoj.org/papers/1f3fb76e625510f42fb42602d5679e15/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/1f3fb76e625510f42fb42602d5679e15/status.svg)](https://joss.theoj.org/papers/1f3fb76e625510f42fb42602d5679e15)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@eigenfoo & @ankurankan, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @dfm know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Review checklist for @eigenfoo

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @ankurankan

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @eigenfoo, @ankurankan it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 3 years ago
Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.42 s (822.3 files/s, 106916.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
HTML                           223            839           1981          29112
Python                          38           1191           2122           5919
CSS                              3            328             47           1627
JavaScript                      79            105            144           1618
Markdown                         2             59              0            169
YAML                             3              4              3             56
TeX                              1             25              0             26
-------------------------------------------------------------------------------
SUM:                           349           2551           4297          38527
-------------------------------------------------------------------------------

Statistical information for the repository '431a776d0fc99019b0d91060' was
gathered on 2021/03/16.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Qm Auber                        75         16160           5061          100.00

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Qm Auber                  11099           68.7          1.7               10.54
whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.3150/08-BEJ172 is OK
- 10.1007/978-3-319-73235-0 is OK
- 10.1201/9780429463976-2 is OK
- 10.1111/1467-9868.00340 is OK
- 10.1017/9781108277495 is OK
- 10.1145/378886.380416 is OK

MISSING DOIs

- 10.25080/majora-7b98e3ed-001 may be a valid DOI for title: pgmpy: Probabilistic graphical models using python
- 10.2200/s00529ed1v01y201308aim023 may be a valid DOI for title: Reasoning with probabilistic and deterministic graphical models: exact algorithms
- 10.1007/978-3-642-17517-6_36 may be a valid DOI for title: Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time

INVALID DOIs

- None
dfm commented 3 years ago

@eigenfoo, @ankurankan — This is the review thread for the paper. All of our communications will happen here from now on. Thanks again for agreeing to participate!

Please read the "Reviewer instructions & questions" in the first comment above.

Both reviewers have checklists at the top of this thread (in that first comment) with the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#3115 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please make a start well ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.

dfm commented 3 years ago

@D-K-E: Can you add those missing DOIs are reported above to the paper?

eigenfoo commented 3 years ago

Hello! I'm still in the middle of my review, but have one question so far.

From the README:

The primary goal is to facilitate the understanding of models and basic inference strategies using well documented data structures based only on Python 3 standard library.

As the overall library is not built for efficiency, we recommend not to use it in production. It should not be to difficult to transfer the concepts introduced in the source code though.

It seems that this repository is mainly written for pedagogical purposes, and perhaps not for actual use in scientific applications - in other words, it sounds like the library is meant to be read, and not run. @D-K-E am I correct in this? If so, @dfm can you advise if such a library is in scope for JOSS publication? I'm unsure if this library satisfies the "research software" requirement of JOSS - some clarity would be helpful.


Also noting from this thread (https://github.com/openjournals/joss-reviews/issues/3015#issuecomment-792050736) that @Viva-Lambda is another account for @D-K-E, so the target repo definitely passes the "contribution and authorship" check box.

D-K-E commented 3 years ago

@dfm I am adding them right away.

@eigenfoo It is a little complicated issue. I am using this library for demonstrating how some data 1 can be analyzed using probabilistic graphical models in my thesis so it is used at least in the case of one scientific work. But my focus during the writing of the library was to be as close as possible to definitions provided by text books (mostly Koller, Friedman 2009) so that there would be less friction between the text book and the implementation. The pedagogical aspect is more of an intended side effect which comes from being close to the text book. However the goal was to reproduce the definitions as close as possible not to teach them.


1 historical documents annotated with a certain flavor of RDFa format.

D-K-E commented 3 years ago

@dfm added the requested DOIs, plus some changes in the documentation. I'll continue to improve the docs.

dfm commented 3 years ago

@eigenfoo: Great question! I am satisfied by your response @D-K-E, but the paper and documentation should make the research applications very clear and demonstrate where this package fits into the ecosystem. So perhaps the discussion in the README should be clarified with this in mind.

eigenfoo commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

eigenfoo commented 3 years ago

@D-K-E this is great work! Detailed comments below.


Summary and Statements of Need

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?

Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?

A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?

In both the documentation and the paper, the summary and statement are quite technical, does not seem to be for a non-specialist audience, and does not make it clear who the target audience is.


Community Guidelines

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Right now the README outlines how users might contribute to the software, but doesn't explicitly ask users who have issues/need support to file an issue. It's probably a safe bet from a user's perspective, but best to make this explicit!


Citations

From the paper, on lines 44-46:

PGMs are known for their wide range of applications in computer vision, information retrieval, disease diagnosis and more recently, in the context of our phd thesis, annotations of ancient documents.

It would be great to cite any portion of your PhD (or related work) that may already be published, especially if it showcases "real-world usage" of PyGModels. I understand if this isn't possible or if you may be unwilling though, so I'll go ahead and cross off the "references" checkbox.


Well Written

Overall, the paper is well-written: there are some minor grammatical mistakes (e.g. phd/PhD on line 45, depend/depends on line 62, not/no on line 67), but these errors don't get in the way of comprehension. I'm willing to let these errors slide, so I'll cross off the "quality of writing" checkbox.

D-K-E commented 3 years ago

@eigenfoo Well, first of all thank you for the suggestions, they really helped me to better phrase the problem.

Here is a list of changes that I made in accordance with your comments.


Summary

I don't think I could do better than your suggestions for the summary. I tried to incorporate them directly to the paper. It significantly decreased the length of the summary paragraph.

Statements of need

I added the following remark to the beginning of the section as per the request for the clarification of the intended public of the package:

Though the students of computer science or statistics might find a pedagogical value going through source code along with a textbook on probabilistic graphical models (something like Sucar [see @Sucar_2015] or Cowell [see @Cowell_2005] or Koller and Friedman [see @Koller_Friedman_2009]), we believe that the value proposition of PyGModels speaks mostly to researchers.

@dfm I hope that this clarifies how the packaged fits into the ecosystem, and resolves the discussion we had about the pedagogical use of it. I can add more details, if you find necessary.


Community Guidelines

I provided two templates for filling the docstrings of functions and explicitly asked users to file an issue with their doubt or intent in the README file.


Citations

Unfortunately published parts of my PhD does not involve the direct use of this library. I am hoping to present a paper at conference with results produced by this library, once I discuss things more with my supervisor. The problem is that there is not enough data (historical documents in the right format) to produce something immediately useful. Hence I need to either wait for experts to produce data, or find someway to transform already existing data to right format. In any case, I need to discuss it with my supervisor.


Well Written

As you might have noticed by now, English is not my first language, so there might be grammatical errors, and some awkward phrases here and there. I corrected the ones you pointed out. Feel free to make other suggestions regarding the language as well.

ankurankan commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon commented 3 years ago

:wave: @ankurankan, please update us on how your review is going (this is an automated reminder).

whedon commented 3 years ago

:wave: @eigenfoo, please update us on how your review is going (this is an automated reminder).

ankurankan commented 3 years ago

@D-K-E Quick question: I see that @Viva-Lambda has also contributed significantly to the repository but is not an author on the paper. Is there a reason for it or am I missing something?

Edit: Sorry, just noticed that you have already mentioned that both are your accounts.

ankurankan commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

ankurankan commented 3 years ago

@D-K-E Thanks for the paper, it was an interesting read but I do have some concerns which I have listed below.

Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?

From the summary, I am unable to clearly see how the PyGModels package fits/adds to the current ecosystem of graphical model packages. From my understanding, the main point being made is that the PyGModels package implements both the graphical and statistical properties of PGMs. But I think this true for most of the packages, as graph-theoretical properties can't really be separated out. For example, in pgmpy, all the model classes inherit networkx's graph classes and add properties on top of them. The algorithms also extensively use the properties of the graphical structure for example computing the elimination order for variable elimination; order of variables for sampling algorithms; causal inference works completely on the analysis of graphical structure; etc.

A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?

From the paper, the two main problems that the package is solving are: 1) compute posterior, 2) Extensibility. But I think there are a lot of packages for computing posterior from computationally optimized (eg. pomegranate) to more readability focused (eg. pgmpy). For extensibility, pgmpy also focuses quite a bit on that (https://github.com/pgmpy/pgmpy/blob/dev/examples/Extending%20pgmpy.ipynb). So, I would have like to see more on how PyGModels adds to these already existing packages Also from reading this section, it's not clear to me what value for users does closely following the conventions and definitions add, or who/why would someone prefer to use this package over the alternatives.

Also, on github's readme, I see that there are some other features implemented in the package like LWF chain graphs, and graph analysis algorithms, which I think might be the unique features of this package but there hasn't been much focus on these in this section.

State of the field: Do the authors describe how this software compares to other commonly-used packages?

I would suggest adding pomegranate (https://github.com/jmschrei/pomegranate) to this comparison as well as it is one of the most popular packages. I also think the statement "our implementation ... conforms to the definition provided" is a bit too strong as all the implementations conform to the standard definitions and just differ in the data structure being used.

Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?

I think the writing needs some work on the content. Reading the current paper, I feel very little space is currently being used to highlight the use case and features of the package. I think it would also really help if the paper is written with a focus on a few main messages that you are trying to convey to the reader. Currently, the sections seem to be going in a bit different directions with not much coherence between them. For example, the summary mentions that the package is better because of the implementation of the graph-theoretical nature of PGMs but this point never comes up again in the rest of the paper. I think things like easy extensibility, LWF chain, etc should also be mentioned in the summary as these are the distinguishing features of the package.

D-K-E commented 3 years ago

@ankurankan Thanks for the detailed review. I pushed some changes to the paper according to your concerns.

Here is a list of changes in accordance with your comments:


Summary

I added the phrase PyGModels also implements several algorithms of interest on LWF chain graphs, also known as mixed graphs. to the summary hoping that it would underline the main contribution of this package to the ecosystem without introducing too much technical jargon.

From the summary ... fits/adds to the current ecosystem...

The main problem here is that since the current packages like pomegranate and pgmpy are quite well developed and production ready, I am not sure if it is feasible to underline the contribution of PyGModels, inference on chain graphs which covers relatively a marginal use case outside of PGM community, to a non specialist audience without getting into technical details. The limitation on LWF chain graphs comes from NetworkX since it does not yet support mixed graphs as stated in the issue. However, since chain graphs can be decomposed to conditional random fields, the inference on chain graphs can be supported via CRFs. Hence the willing user might find ways of doing inference on chain graphs using existing packages as well. Yet again, I think this level of discussion is too technical for a non specialist audience but let's ask @dfm whether I should incorporate all this into the paper.

Statement of Need

As per suggested I increased the number of issues that PyGModels solve. I added

3. Decomposing the chain graph into chain components
4. Moralizing the chain graph into a Markov Network.
5. Decomposing the chain graph into Conditional Random Fields.

...what value for users does closely following the conventions and definitions add...

Bluntly speaking not a lot in practical sense, however it might decrease the friction between text books and code. Hence it has a pedagogical value. The documentation is not very fulfilling yet to truly accomplish this though.

adding pomegranate ... is a bit too strong...

I added pomegranate to the comparison, and added the phrase:

The last aspect is also the case for other packages, however PyGModels differs from them with respect to the data structure used in the implementation.

I feel very little space is currently being used to highlight the use case and features of the package

As stated, I mentioned LWF chain graphs in the summary as you've requested. Since the recommended word count is between 250 - 1000 for a paper according to journal's documentation and the paper's count is 1037, I don't think it would be wise for me to add new sections. I can remove certain parts of the paper, if you think that's good choice. If you think removal is the way to go, would mind suggesting some parts ?

ankurankan commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

eigenfoo commented 3 years ago

Apologies for the delay - I've completed my review. @ankurankan seems much more knowledgeable in this area than I am, so I'll refrain from adding more comments in light of his review.

ankurankan commented 3 years ago

I have updated my review checklist. I think the paper is at a satisfactory level now. The major thing that is still missing in my opinion are examples in the documentation. I can't find any complete usage example on the documentation page, @D-K-E maybe I am missing something?

D-K-E commented 3 years ago

@ankurankan I haven't updated usage examples for awhile now. Most of my commits went to documenting individual classes. I will try to add usage examples by the end of this week. I am a little busy this week, so it might extend to next week as well

D-K-E commented 3 years ago

I have updated documentation with usage examples. Now all the probablistic graphical models that can be inferred have a usage example. All of them have a unittest as well.

I'll try to improve the documentation on the intermediary objects.

ankurankan commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

ankurankan commented 3 years ago

@dfm I think everything looks good now.

dfm commented 3 years ago

Awesome - thanks @ankurankan and @eigenfoo!!

@D-K-E: Give me a few days to do some last edits and checks and then I'll have some last steps for you to take before final processing.

dfm commented 3 years ago

@D-K-E: Thanks for your patience. I've opened a pull request with some edits to the paper. Can you take a look at that and once you've merged, please take the following steps:

Let me know if you have questions or run into any issues!

D-K-E commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

D-K-E commented 3 years ago

@dfm All right I think I did what you said: The new version is v0.1.0. The release is called JOSS version. DOI: 10.5281/zenodo.4751740

The title and the author are the same. Affiliation is okay as well. The manuscript is all good as well.

dfm commented 3 years ago

@whedon check references

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.25080/Majora-7b98e3ed-001  is OK
- 10.2200/S00893ED2V01Y201901AIM041 is OK
- 10.3150/08-BEJ172 is OK
- 10.1007/978-3-642-17517-6_36 is OK
- 10.1007/978-3-319-73235-0 is OK
- 10.1201/9780429463976-2 is OK
- 10.1111/1467-9868.00340 is OK
- 10.1017/9781108277495 is OK
- 10.1145/378886.380416 is OK

MISSING DOIs

- None

INVALID DOIs

- None
dfm commented 3 years ago

@whedon set 10.5281/zenodo.4751740 as archive

whedon commented 3 years ago

OK. 10.5281/zenodo.4751740 is the archive.

dfm commented 3 years ago

@whedon set v0.1.0 as version

whedon commented 3 years ago

OK. v0.1.0 is the version.

dfm commented 3 years ago

@whedon accept

whedon commented 3 years ago
Attempting dry run of processing paper acceptance...
whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.25080/Majora-7b98e3ed-001  is OK
- 10.2200/S00893ED2V01Y201901AIM041 is OK
- 10.3150/08-BEJ172 is OK
- 10.1007/978-3-642-17517-6_36 is OK
- 10.1007/978-3-319-73235-0 is OK
- 10.1201/9780429463976-2 is OK
- 10.1111/1467-9868.00340 is OK
- 10.1017/9781108277495 is OK
- 10.1145/378886.380416 is OK

MISSING DOIs

- None

INVALID DOIs

- None
whedon commented 3 years ago

:wave: @openjournals/joss-eics, this paper is ready to be accepted and published.

Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/2302

If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/2302, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true
dfm commented 3 years ago

@eigenfoo, @ankurankan: Thanks again for your excellent and constructive reviews! I (and all of us at JOSS) really appreciate your contributions here.

@D-K-E: This is looking good! Thanks for your submission and your work on it throughout this process. I've handed this submission off the the Editors-in-Chief and they might have some final comments/edits before the final publication. But, in the meantime, congrats on your paper!

D-K-E commented 3 years ago

Thank you @dfm , and thanks @eigenfoo and @ankurankan for the constructive reviews. Thanks @whedon the bot, for all your hard work. Should I call the deposit=true, for me it looks good ?

dfm commented 3 years ago

@D-K-E: no, the Editor-in-chief will do that after their final checks.