openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
703 stars 36 forks source link

[REVIEW]: Renard: A Modular Pipeline for Extracting Character Networks from Narrative Texts #6574

Closed editorialbot closed 2 months ago

editorialbot commented 5 months ago

Submitting author: !--author-handle-->@Aethor<!--end-author-handle-- (Arthur Amalvy) Repository: https://github.com/CompNet/Renard Branch with paper.md (empty if default branch): joss Version: v0.4.2 Editor: !--editor-->@logological<!--end-editor-- Reviewers: @tellarin, @LCB0B Archive: 10.5281/zenodo.12167900

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/5d1e7cb6a59156ee850c74557e811d74"><img src="https://joss.theoj.org/papers/5d1e7cb6a59156ee850c74557e811d74/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/5d1e7cb6a59156ee850c74557e811d74/status.svg)](https://joss.theoj.org/papers/5d1e7cb6a59156ee850c74557e811d74)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@tellarin & @LCB0B, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @logological know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @LCB0B

📝 Checklist for @tellarin

editorialbot commented 5 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 5 months ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.05 s (1236.9 files/s, 146416.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          40            813            958           2738
CSV                              1              0              0           1084
reStructuredText                 7            204            152            288
Markdown                         2             46              0             94
TeX                              1              9              0             79
YAML                             2              1              4             57
TOML                             1              4              2             41
DOS Batch                        1              8              1             26
make                             1              4              7              9
-------------------------------------------------------------------------------
SUM:                            56           1089           1124           4416
-------------------------------------------------------------------------------

Commit count by author:

   279  Aethor
     1  Arfon Smith
     1  Arthur Amalvy
     1  Vincent Labatut
editorialbot commented 5 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3344548 is OK
- 10.4304/jcp.8.9.2442-2447 is OK
- 10.7763/IJMLC.2013.V3.282 is OK
- 10.4000/resf.1183 is OK

MISSING DOIs

- No DOI given, and none found for title: Extracting Networks of Characters and Places from ...
- No DOI given, and none found for title: Charnetto
- No DOI given, and none found for title: Character Networks and Centrality
- No DOI given, and none found for title: Character network analysis of Émile Zola’s Les Rou...
- No DOI given, and none found for title: Exploring network structure, dynamics and function...

INVALID DOIs

- None
editorialbot commented 5 months ago

Paper file info:

📄 Wordcount for paper.md is 988

✅ The paper includes a Statement of need section

editorialbot commented 5 months ago

License info:

🟡 License found: GNU General Public License v3.0 (Check here for OSI approval)

editorialbot commented 5 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

logological commented 4 months ago

@tellarin, @LCB0B In case you haven't received the notification, this submission is ready for you to review. Could you please generate your checklists according to the instructions in this issue and start by checking off the COI and CoC items, at least to make sure the permissions work? Thanks!

logological commented 4 months ago

@tellarin, @LCB0B This review has been open for over a month now and I haven't yet heard from you. Could you please confirm that you will still be reviewing this submission?

tellarin commented 4 months ago

@logological Sorry for the delay. I'll still review this submission.

logological commented 4 months ago

@tellarin Thanks for confirming your participation. If you could please generate your checklist as described at the top of this issue, and start evaluating the criteria in it, that would be great! In the meantime I've sent @LCB0B another ping, this time by e-mail.

LCB0B commented 3 months ago

Review checklist for @LCB0B

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

LCB0B commented 3 months ago

Functionality

Installation: maybe add python version in readme (>3.8, <3.11)

Functionality: Have the functional claims of the software been confirmed?

Documentation

A statement of need: Target audience is not clearly stated Example usage: The example is not in the documentation but in /renard_tutorial.py but it does not run (issue flagged on the repo, https://github.com/CompNet/Renard/issues/8) (the example is also mentioned in the readme, could be nice to add it to the documentation)

Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified? (still checking)

Software paper

Quality of writing: minor typos:

Aethor commented 3 months ago

Hi,

thank you for taking the time to review our article and software!

Installation: maybe add python version in readme (>3.8, <3.11)

This is a good point, we added it in the README (see b52d9a1c942d15f4431d752002e7bc6334c6f194).

Functionality: Have the functional claims of the software been confirmed?

There might be some author disambiguation issues, i.e. figure1 shows a character name "mr", moreover there are a 'mr bingley"and 'mrs bingley" but also just 'bingley"

Renard is (unfortunately!) bound to make some errors. We base our alias resolution algorithm on Vala et al., 2015 (https://aclanthology.org/D15-1088/), but the problem is not trivial and not solved in general. We hope to improve alias resolution as development goes further.

A statement of need: Target audience is not clearly stated

Good point. We added a statement of need in the article itself and in the documentation (https://compnet.github.io/Renard/introduction.html#target-audience-and-intended-usage).

Example usage: The example is not in the documentation but in /renard_tutorial.py (the example is also mentioned in the readme, could be nice to add it to the documentation)

We have an example in the documentation (see https://compnet.github.io/Renard/pipeline.html) but not in the README. We guess this is what you mean here? Since it is useful, we added the basic example from the documentation in the README (see https://github.com/CompNet/Renard/commit/dad509d424fbecd67fc2766cff262dc10cc6faaf).

Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified? (still checking)

Yes! You can see this under the "Running Tests" section of the README, or in the Contributing section of the doc where we highlight how to use tests (https://compnet.github.io/Renard/contributing.html). All tests are located under ./tests. We use pytest and hypothesis to help testing, and we have a CI in place on the main branch to check for issues. The CI also automatically builds the documentation.

Quality of writing: minor typos:

Line 47: The word "examplifies" should be corrected to "exemplifies." Line 50 : (at the end) two characterS Line 83 : should be "Rochat, Y." for consistencies between references

Good catch, we fixed these and updated the article.

Aethor commented 3 months ago

@editorialbot generate pdf

editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

LCB0B commented 3 months ago

Thank you for the quick reply, I only included the points where I still have questions/remarks. I checked the other points.

Functionality: Have the functional claims of the software been confirmed?

There might be some author disambiguation issues, i.e. figure1 shows a character name "mr", moreover there are a 'mr bingley"and 'mrs bingley" but also just 'bingley"

Renard is (unfortunately!) bound to make some errors. We base our alias resolution algorithm on Vala et al., 2015 (https://aclanthology.org/D15-1088/), but the problem is not trivial and not solved in general. We hope to improve alias resolution as development goes further.

indeed author disambiguation is a hard problem, but regarding "titles" (like Mr.) , isn't it what renard/resources/titles/titles.py is suppose to do ?

Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified? (still checking)

All the tests run perfectly except that the .hypothesis/ folder, that is needed for running some tests, is included in the .gitignore

Aethor commented 3 months ago

indeed author disambiguation is a hard problem, but regarding "titles" (like Mr.) , isn't it what renard/resources/titles/titles.py is suppose to do ?

Currently, titles help to determine the gender of characters, and to correctly parse their name. If a lone title (such as the token "Mister" or "Mr.") is detected as a full entity by the NER step, it will be considered as a character in its own right. Sometimes this is actually needed, because some characters may be referred by a title exclusively or almost exclusively (this happens for example with "the Director" in Huxley's "Brave New World", or "the Lady" in Glen Cook's fantasy novel "The Black Company"). However, it may be a good idea to add a parameter to the character unification module to force Renard to ignore such instances of lone titles, which makes sense for a good number of novels. What do you think of such a solution?

All the tests run perfectly except that the .hypothesis/ folder, that is needed for running some tests, is included in the .gitignore

From what I understand, Hypothesis tests should run without the .hypothesis folder. Given the hypothesis documentation, I expect that this folder contains already tested examples as a cache. Since the doc explains that the default usecase is to not track the folder under version control (https://hypothesis.readthedocs.io/en/hypothesis-python-4.57.1/database.html#sharing-your-example-database), we did not do it, but there might be good reasons to do so?

LCB0B commented 3 months ago

Thank you for the even quicker reply.

Currently, titles help to determine the gender of characters, and to correctly parse their name. If a lone title (such as the token "Mister" or "Mr.") is detected as a full entity by the NER step, it will be considered as a character in its own right. Sometimes this is actually needed, because some characters may be referred by a title exclusively or almost exclusively (this happens for example with "the Director" in Huxley's "Brave New World", or "the Lady" in Glen Cook's fantasy novel "The Black Company"). However, it may be a good idea to add a parameter to the character unification module to force Renard to ignore such instances of lone titles, which makes sense for a good number of novels. What do you think of such a solution?

Yes, it sounds like a great solution! I'll check the box.

From what I understand, Hypothesis tests should run without the .hypothesis folder. Given the hypothesis documentation, I expect that this folder contains already tested examples as a cache. Since the doc explains that the default usecase is to not track the folder under version control (https://hypothesis.readthedocs.io/en/hypothesis-python-4.57.1/database.html#sharing-your-example-database), we did not do it, but there might be good reasons to do so?

Absolutely, that was an oversight on my part. I'll check the box

And thank you for developing this python library, i already see some use case for it :)

Aethor commented 3 months ago

Thank you for reviewing!

logological commented 3 months ago

@LCB0B many thanks for your review! @tellarin it's been a while since we've heard from you; could you please give us some idea of when you expect to do your review?

tellarin commented 3 months ago

Review checklist for @tellarin

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

tellarin commented 3 months ago

Overall things look quite good. Just a couple comments...

Functionality

Installation: I see the notice in the repo that the package supports only Python 3.8 to 3.10. Is there a specific reason to not support newer versions? I think the reason for this limitation should be documented in the repo, so future users can consider contributing to address it too.

BTW, the PyPI page for the package should also link back to the GitHub repo as the package homepage.

Functionality: The previous discussion clarified some questions I had. Thanks.

But I believe it would be good to say something more about the dynamic networks. The paper basically only covers the static scenario, but supporting dynamic networks is one of the highlighted differentiators of the package. One example and sentence about how using it looks like would be enough.

Also, when discussing the definitions of requirements for a pipeline step, it would be useful to point to somewhere in the repo for an example. This could be a footnote in the paper.

Software paper

References:

The paper should cite Vala et al., 2015 as it's the base for the alias resolution currently in use.

It would also be interesting to cite the Labatut and Bost 2019 survey on character network extraction as background for the design of the system.

Lastly, capitalisation should be fixed in reference Rochat 2015. "... Émile Zola’s Les Rougon-Macquart."

Quality of writing: Below are some minor typos and other issues I've noticed:

Aethor commented 3 months ago

Hi,

thank you for reviewing Renard!

Installation: I see the notice in the repo that the package supports only Python 3.8 to 3.10. Is there a specific reason to not support newer versions? I think the reason for this limitation should be documented in the repo, so future users can consider contributing to address it too.

There is no specific reason for not supporting Python 3.11+, aside from maintenance time. We do update the supported Python versions from time to time, but we always have to fix some issues with Renard's dependencies when we do so. We hope to support Python 3.11 soon!

BTW, the PyPI page for the package should also link back to the GitHub repo as the package homepage.

Good point, we fixed this by adding the relevant links in the pyproject.toml file (see 02d4879a2eb6580265462165affbb51c46f63320). As a result, we published Renard 0.4.1 to update the project on PyPi (and ship a few bugfixs).

But I believe it would be good to say something more about the dynamic networks. The paper basically only covers the static scenario, but supporting dynamic networks is one of the highlighted differentiators of the package. One example and sentence about how using it looks like would be enough.

We added a new sentence explaining how Renard could be used to extract a dynamic network, and a footnote with a link to the section of the documentation specifically related to extracting dynamic networks. We also modified the example code in the article to show where to pass parameters to extract a dynamic network.

Also, when discussing the definitions of requirements for a pipeline step, it would be useful to point to somewhere in the repo for an example. This could be a footnote in the paper.

We added a footnote in the paper, pointing to the documentation. In the documentation itself, we added an example of what would happen in case of a missing step requirement (see 24b8f6525fe039118bde424425f492fe8748a0e1).

The paper should cite Vala et al., 2015 as it's the base for the alias resolution currently in use. It would also be interesting to cite the Labatut and Bost 2019 survey on character network extraction as background for the design of the system.

We modified the article to include these citations. We now state that "we base Renard on the generic character network extraction framework highlighted by the survey of Labatut and Bost (2019)".

Lastly, capitalisation should be fixed in reference Rochat 2015. "... Émile Zola’s Les Rougon-Macquart."

Quality of writing: Below are some minor typos and other issues I've noticed:

Thanks, we fixed these as well!

Aethor commented 3 months ago

@editorialbot generate pdf

editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

tellarin commented 3 months ago

@Aethor Thanks for the changes. They look OK to me. I've updated the checklist.

logological commented 3 months ago

@editorialbot check references

editorialbot commented 3 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3344548 is OK
- 10.4304/jcp.8.9.2442-2447 is OK
- 10.7763/IJMLC.2013.V3.282 is OK
- 10.4000/resf.1183 is OK
- 10.18653/v1/D15-1088 is OK

MISSING DOIs

- No DOI given, and none found for title: Extracting Networks of Characters and Places from ...
- No DOI given, and none found for title: Charnetto
- No DOI given, and none found for title: Character Networks and Centrality
- No DOI given, and none found for title: Character network analysis of Émile Zola’s Les Rou...
- No DOI given, and none found for title: Exploring network structure, dynamics and function...

INVALID DOIs

- None
logological commented 3 months ago

@tellarin Thanks for your review! @LCB0B The authors have made some revisions to the paper in light of tellarin's review. Could you have a final quick check and confirm that you're happy with the current version of the manuscript?

logological commented 3 months ago

@Aethor Where possible, could you please update your paper to supply the missing DOIs reported by editorialbot above? Possibly the only reference that has a DOI that is not given is Marazzato & Sparavigna: 10.48550/arXiv.1402.4259. However, you might consider replacing this reference, an unreviewed preprint, with what what seems to be the later journal paper based on it:

Sparavigna, Amelia Carolina and Marazzato, Roberto (March 2015). Analysis of a Play by Means of CHAPLIN, the Characters and Places Interaction Network Software. International Journal of Sciences, 2015, 4(3):60-68. DOI: 10.18483/ijSci.662

Aethor commented 3 months ago

Good catch, Sparavigna and Marazzato (2015) also presents their CHAPLIN software so I updated the reference to cite this journal article instead of their earlier arXiv preprint. I also did another pass to confirm that the other references missing DOIs really did not have one, and I was unable to find any additional DOI.

Aethor commented 3 months ago

@editorialbot generate pdf

editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Aethor commented 3 months ago

@editorialbot check references

editorialbot commented 3 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.18483/ijSci.662 is OK
- 10.1145/3344548 is OK
- 10.4304/jcp.8.9.2442-2447 is OK
- 10.7763/IJMLC.2013.V3.282 is OK
- 10.4000/resf.1183 is OK
- 10.18653/v1/D15-1088 is OK

MISSING DOIs

- No DOI given, and none found for title: Charnetto
- No DOI given, and none found for title: Character Networks and Centrality
- No DOI given, and none found for title: Character network analysis of Émile Zola’s Les Rou...
- No DOI given, and none found for title: Exploring network structure, dynamics and function...

INVALID DOIs

- None
logological commented 2 months ago

@Aethor Thanks for the changes. I think the manuscript is in good shape now. I'll just give @LCB0B a while longer to confirm whether they're also happy, and if so, we can move on to the next steps.

LCB0B commented 2 months ago

@logological @Aethor The changes look good to me as well. I'm happy with the current state of the manuscript.

logological commented 2 months ago

@Aethor It looks like we're ready to proceed with the final steps of the publication workflow. Could you please now handle the author tasks listed below?

Post-Review Checklist for Editor and Authors

Additional Author Tasks After Review is Complete

Editor Tasks Prior to Acceptance

Aethor commented 2 months ago
logological commented 2 months ago
  • The license of the archive is the same as Renard : GPLv3

@Aethor I'm afraid this is not correct. Your GitHub repository includes the GPLv3 licence in the LICENSE file and the pyproject.toml file indicates the licence is GPL-3.0-only. However, in Zenodo you have marked the licence as "GNU General Public License v3.0 or later" (i.e., GPL-3.0-or-later). Could you please reconcile this? You should either change your pyproject.toml so that it indicates GPL-3.0-or-later (provided all contributors holding copyright agree to this) or else change the licence tag on Zenodo.

Aethor commented 2 months ago

@logological oh, you're right! I fixed the license in Zenodo, so it is now GNU General Public License v3.0 only.

logological commented 2 months ago

@Aethor Thanks for fixing the licence tag! In other news, I have identified a few grammatical errors in the paper and submitted a pull request. Could you please review it and (if you are happy with it) accept it?

Aethor commented 2 months ago

@logological thanks, I just merged your PR with your proposal

logological commented 2 months ago

@editorialbot set 10.5281/zenodo.12167900 as archive

editorialbot commented 2 months ago

Done! archive is now 10.5281/zenodo.12167900

logological commented 2 months ago

@editorialbot set <v0.4.2> as version

editorialbot commented 2 months ago

Done! version is now <v0.4.2>

logological commented 2 months ago

@editorialbot set v0.4.2 as version

editorialbot commented 2 months ago

Done! version is now v0.4.2

logological commented 2 months ago

@editorialbot generate pdf