Closed editorialbot closed 5 months ago
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.88 T=0.07 s (291.7 files/s, 69714.4 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
HTML 7 95 0 2244
Python 6 60 153 1056
TeX 1 12 0 143
Markdown 2 33 0 97
Jupyter Notebook 1 0 563 29
TOML 1 2 0 26
YAML 1 1 9 18
-------------------------------------------------------------------------------
SUM: 19 203 725 3613
-------------------------------------------------------------------------------
gitinspector failed to run statistical information for the repository
Wordcount for paper.md
is 938
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1002/bse.2195 is OK
- 10.21105/joss.05124 is OK
- 10.1016/j.enpol.2008.02.039 is OK
- 10.1007/s10668-016-9801-z is OK
- 10.3390/ECP2023-14728 is OK
- 10.5040/9781509934058.0025 is OK
- 10.1007/978-981-10-3521-0_31 is OK
- 10.3390/su14053095 is OK
MISSING DOIs
- None
INVALID DOIs
- None
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@editorialbot add @luyuhao0326 to reviewers
@luyuhao0326 added to the reviewers list!
:wave: Hi @varsha2509, @luyuhao0326, thank you so much for helping out at JOSS. If you need any pointers, please feel free to look at previous reviews (which can be found by looking at published papers) and the documentation. If you need to comment on the code itself, opening an issue at the repo and then linking to it from here (to help me/others keep track) is the way to go. For comments on the paper, you can also open issues or PRs (say for typos), but those can be directly posted as replies in this issue. Thanks, and feel free to reach out if you need me. :relaxed:
Thanks for the invitation and below are my reviews on installation, and software paper (most of my comments will be related to the paper since I am not a proficient Python user).
Installation
Software paper
The idea is novel and I can see this work being useful in many domains. I have one question regarding the example use cases listed in the paper: the paper claims that seesus can be used in "label academic publications" and "large-scale scans of planning documents". However, the example in README.md only shows how seesus evaluates individual sentences which can cause potential misinterpretation and biased results as the context of "academic publications" and "planning documents" will likely be missing when being evaluated sentence by sentence. Example 3 provided here, for example is not really a paragraph.
The statement of need is clear but a bit thin. Although I can appreciate JOSS is a more software-focused journal, it would still be great to provide some context on the current status of text mining/classification on UN-SDG and why it is important to for example, "quantify which dimension of sustainability receives the most attention"
Accuracy of 75.5% is decent but is not particularly high. Considering the evaluation method is from a different package, it would be great if the authors can provide a statement (or even better specific development plans) on how to improve the accuracy and/or usabililty of future text-mining on SDG.
Hi @luyuhao0326,
Thank you very much for taking the time to review seesus
. We appreciate your helpful feedback. Please find our point-by-point responses below.
Installation
It would be great to include more detailed installation instructions for users who are not so familiar with Python (e.g., me and many others who will be potentially benefiting from this work) and/or GitHub.
Thank you for your suggestion. seesus
is indeed a Python-based software that requires basic knowledge of Python programming. To simplify the installation process, we chose to publish seesus
to PyPI and use pip install. In this way, users can easily install the package with one line of command, without the need to manually manage dependencies and configure the package. We have made the installation instructions clearer as suggested (6f02afd).
That being said, I am unable to install and run this package on my machine. I will be happy to do so if one of the authors can help me install the package.
I am more than happy to help. Do you already have Python, pip, and Jupyter (for running the example.ipynb
) installed? If yes, typing pip install seesus
in your terminal should do the job. If not, I would recommend installing Anaconda first. Please go to Anaconda's website and install it for your specific operating system (instructions). Then you should be able to install seesus
by inputting pip install seesus
in your terminal. Please let me know if you encounter any problems.
Software paper
The idea is novel and I can see this work being useful in many domains. I have one question regarding the example use cases listed in the paper: the paper claims that seesus can be used in "label academic publications" and "large-scale scans of planning documents". However, the example in README.md only shows how seesus evaluates individual sentences which can cause potential misinterpretation and biased results as the context of "academic publications" and "planning documents" will likely be missing when being evaluated sentence by sentence. Example 3 provided here, for example is not really a paragraph.
Glad to hear that you find our package novel and potentially useful in many domains. To achieve the best results, we recommend splitting a paragraph or a whole document into individual sentences (i.e., using individual sentences as the basic unit for seesus
to analyze). This was the reason why we only showed how seesus
evaluates individual sentences in README.md
at the beginning. Thank you for pointing out that this might cause misinterpretation. To address this concern, first, we have copied the paragraph example (example 3) in example.ipynb
to README.md
(66862f8). Here you can tell this example is a paragraph (i.e., a chunk of text with several sentences) by scrolling to the right. The display of a Jupyter Notebook in GitHub is a bit confusing because the text is truncated. We’ve added a print statement to prevent this confusion (c77e33e). Second, we have added another example in example.ipynb
to demonstrate the package’s usage in the context of academic publications (c77e33e). For both the examples of an academic publication and a planning document, we split the paragraphs into sentences and printed out the results for each sentence. Users can organize the results according to their needs.
The statement of need is clear but a bit thin. Although I can appreciate JOSS is a more software-focused journal, it would still be great to provide some context on the current status of text mining/classification on UN-SDG and why it is important to for example, "quantify which dimension of sustainability receives the most attention"
Thank you for your suggestion. We have incorporated additional context of text mining on SDGs in our paper as suggested (c8162fe). Given that JOSS requires papers to be between 250-1000 words (source), we hope the edits are sufficient to provide the necessary improvement to our statement of need.
Accuracy of 75.5% is decent but is not particularly high. Considering the evaluation method is from a different package, it would be great if the authors can provide a statement (or even better specific development plans) on how to improve the accuracy and/or usabililty of future text-mining on SDG.
This is a great idea. We’ve added a statement on maintenance in README.md
to address this (1eacaaf). Following the best practices of open-source software, we welcome and encourage users to report issues if they find that a matching syntax is not accurate or can be improved.
Thanks again for your time and suggestions!
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@caimeng2 Thanks for your response. I will get back to you within a week or so. I will also try to install the package and give feedback if there is any.
Yuhao
I am more than happy to help. Do you already have Python, pip, and Jupyter (for running the
example.ipynb
) installed? If yes, typingpip install seesus
in your terminal should do the job. If not, I would recommend installing Anaconda first. Please go to Anaconda's website and install it for your specific operating system (instructions). Then you should be able to installseesus
by inputtingpip install seesus
in your terminal. Please let me know if you encounter any problems.
Hello, I managed to install the package and while testing one of the provided examples, I encountered a LookupError. Please see the code here.
Hello, I managed to install the package and while testing one of the provided examples, I encountered a LookupError. Please see the code here.
Hi @luyuhao0326, I'm so glad that you got the installation working :tada: The link to the LookupError is pointing to your localhost so I can't see the traceback. But I suspect it's a bug with nltk
(see this). Feel free to try some of the solutions proposed there. An alternative is to use re
instead of nltk
. Please see if the following code works.
from seesus import SeeSus
import re
text2 = "By working with communities in the floodplain and facilitating flood-resistant building design, DCP is reducing the city’s risks to sea level rise and coastal flooding. Hurricane Sandy was a stark reminder of these risks. The City, led by the Mayor’s Office of Recovery and Resiliency (ORR), has developed a multifaceted plan for recovering from Sandy and improving the city’s resiliency–the ability of its neighborhoods, buildings and infrastructure to withstand and recover quickly from flooding and climate events. As part of this effort, DCP has initiated a series of projects to identify and implement land use and zoning changes as well as other actions needed to support the short-term recovery and long-term vitality of communities affected by Hurricane Sandy and other areas at risk of coastal flooding."
for sent in re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s', text2):
result = SeeSus(sent)
print('"', sent, '"', sep = "")
print("Is the sentence related to achieving sustainability?", result.sus)
print("Which SDGs?", result.sdg)
print("Which SDG targets specifically?", result.target)
print("which dimensions of sustainability?", result.see)
print("----------------")
Thank you for letting me know about this issue. I'll update the examples.
Hello, I managed to install the package and while testing one of the provided examples, I encountered a LookupError. Please see the code here.
Hi @luyuhao0326, I'm so glad that you got the installation working 🎉 The link to the LookupError is pointing to your localhost so I can't see the traceback. But I suspect it's a bug with
nltk
(see this). Feel free to try some of the solutions proposed there. An alternative is to usere
instead ofnltk
. Please see if the following code works.from seesus import SeeSus import re text2 = "By working with communities in the floodplain and facilitating flood-resistant building design, DCP is reducing the city’s risks to sea level rise and coastal flooding. Hurricane Sandy was a stark reminder of these risks. The City, led by the Mayor’s Office of Recovery and Resiliency (ORR), has developed a multifaceted plan for recovering from Sandy and improving the city’s resiliency–the ability of its neighborhoods, buildings and infrastructure to withstand and recover quickly from flooding and climate events. As part of this effort, DCP has initiated a series of projects to identify and implement land use and zoning changes as well as other actions needed to support the short-term recovery and long-term vitality of communities affected by Hurricane Sandy and other areas at risk of coastal flooding." for sent in re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s', text2): result = SeeSus(sent) print('"', sent, '"', sep = "") print("Is the sentence related to achieving sustainability?", result.sus) print("Which SDGs?", result.sdg) print("Which SDG targets specifically?", result.target) print("which dimensions of sustainability?", result.see) print("----------------")
Thank you for letting me know about this issue. I'll update the examples.
Indeed this is the bug. It is now working with re
@luyuhao0326 added to the reviewers list!
The authors have addressed all my comments and made appropriate revisions. I recommend this submission to be accepted by JOSS.
@luyuhao0326 added to the reviewers list!
The authors have addressed all my comments and made appropriate revisions. I recommend this submission to be accepted by JOSS.
Thanks again for your suggestions, which helped to make our paper and software better!
@varsha2509 is everything going OK with your review? 😊
Hello. Thank you for giving me the opportunity to review this work. This authors have done a great job documenting the software, installation instructions and the functionality. Below are my comments based on the review checklist as well as a additional notes to help improve readability and adoption of this work:
SDG_keys.py
specifically from this line onwards were determined? Can the authors confirm that all targets were included in the keywords?seesus
are over existing tools, other than the functionality which classifies the expression as environmental, social or economic
sustainability. Making the statement of need stronger will help improve adoption of this tool. seesus
over existing open source tools? Other notes:
Running through the code and script as examples, the current tool is not able to capture negative expressions, as Regex lacks semantic understanding of text. For instance using this sentence as an input "One should not resolve climate change for environmental sustainability." this is being classified as relating to achieving SDG13 and SDG15 but the output should be 'None' or 'Does not match SDG goals'.
This seems to be a limitation of this tool and it would be worth highlighting this in a separate section and including some ideas on how the authors plan to address these limitations in future releases of this tool. This will help users be fully aware of the benefits and limitations of this software.
Related to above, could the authors talk briefly about limitations of regex for pattern matching over existing semantic text search/language models?
The authors mention that seesus
achieves an accuracy rate of 75.5%, as determined by alignment with manual coding. Can the authors comment on how they plan to improve the performance of this tool in future releases as 75% accuracy currently seems low for usability.
Hi @varsha2509,
Thank you for taking the time to review our software and for your valuable feedback. Please find our point-by-point responses below.
Functionality - while the functional claims of the software have been verified, could the authors provide more details on how the indirect search keys in SDG_keys.py specifically from this line onwards were determined? Can the authors confirm that all targets were included in the keywords?
Yes, we can confirm that all targets are included in the keywords. We created the search keys at the levels of both the 17 SDGs and the 169 SDG targets. The indirect keys were first based on Thesaurus, and we (four researchers specialized in SDGs) manually assessed and improved the accuracy of the matching syntax by using thousands of randomly-selected statements from corporate reports. We conducted three rounds of fine-tuning and finalized these keys.
Automated tests - while the existing tests cover the explained functionality, the authors should consider including more examples in tests, especially relevant sentences that have a negative connotation to clarify the performance of this tool.
Thank you for pointing this out. Indeed, matching with negative connotation is seesus
’s limitation. seesus
can identify the terms related to SDGs but cannot distinguish between achieving SDGs and failing to do so. This limitation lies in regular expression’s limited logic capability and lack of context awareness. We have added another test of direct matching (4968c35) and edited the paper, deleting expressions regarding “attainment of SDGs,” to make it clear that seesus
is designed to classify based on relevance.
Statement of need - The existing statement of need isn't particularly strong. It's not very clear to me what the benefits of seesus are over existing tools, other than the functionality which classifies the expression as environmental, social or economic sustainability. Making the statement of need stronger will help improve adoption of this tool.
The biggest benefit of seesus
is its finer scale: it captures not only the SDGs but also the 169 SDG targets. To the best of our knowledge, no other Python tool does this. In addition, compared to tools based on machine learning, seesus
allows users to examine and modify the matching syntax, so users can always understand and have control over the results. We’ve edited the statement of need to make it stronger as suggested (3f35864).
Could the authors provide an example of what they mean by "also the attainment of SDGs" as specified in the statement of need?
What we meant was seesus
specifically looks for terms that are related to achieving the SDGs, and not just SDG-related topics themselves. For example, it is not to find words solely related to emissions (e.g., "emissions", "carbon"), but it looks for terms such as "lowering emissions" and "reducing carbon." However, we realized that this sentence is rather confusing as seesus
cannot identify negative expressions, so we have deleted it to avoid further confusion.
State of the field - OSDG (https://arxiv.org/abs/2211.11252, https://github.com/osdg-ai/osdg-tool) is another open source tool for text based classification of SDG goals and these use NLP/ML based methods. This may be worth highlighting as one of the other classifiers in the statement of need. Along with this, could the authors also include why users would consider seesus over existing open source tools?
Thank you for this reference. We have added it to the existing classifiers. We tested OSDG
and noticed that it is not able to capture negative expressions either, and the results are only the 17 SDGs, not the targets. We have revised our paper to highlight seesus
’s benefit.
Other notes: Running through the code and script as examples, the current tool is not able to capture negative expressions, as Regex lacks semantic understanding of text. For instance using this sentence as an input "One should not resolve climate change for environmental sustainability." this is being classified as relating to achieving SDG13 and SDG15 but the output should be 'None' or 'Does not match SDG goals'. Screenshot 2024-02-19 at 3 23 50 PM This seems to be a limitation of this tool and it would be worth highlighting this in a separate section and including some ideas on how the authors plan to address these limitations in future releases of this tool. This will help users be fully aware of the benefits and limitations of this software. Related to above, could the authors talk briefly about limitations of regex for pattern matching over existing semantic text search/language models?
Thank you for your suggestion. This is a very good point. Compared to language models, regex lacks the ability to understand the semantic meaning or context of text, as it operates based on character patterns. As suggested, we have added a paragraph at the end of the paper to make the limitation of seesus
more clear.
The authors mention that seesus achieves an accuracy rate of 75.5%, as determined by alignment with manual coding. Can the authors comment on how they plan to improve the performance of this tool in future releases as 75% accuracy currently seems low for usability.
Yes, we have included it in the revision of the paper. We devoted hundreds of hours to fine-tune the matching syntax. 75.5% seems low but it is quite reasonable for traditional qualitative analysis. The human intercoder agreement on the same text was only at 83%.
Thanks again for all your comments and suggestions! We feel that our paper is much clearer and stronger than the previous version. Thank you!!! Please let us know if there's anything else.
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Hello @caimeng2 - thank you for addressing my comments. Please see my responses below:
Yes, we can confirm that all targets are included in the keywords. We created the search keys at the levels of both the 17 SDGs and the 169 SDG targets. The indirect keys were first based on Thesaurus, and we (four researchers specialized in SDGs) manually assessed and improved the accuracy of the matching syntax by using thousands of randomly-selected statements from corporate reports. We conducted three rounds of fine-tuning and finalized these keys.
Thanks for confirming this. Depending on word limit, I'd recommend including a line or two about this in the paper, or in the Github Readme, under a methodology section.
Besides the suggestion above, the authors have fully addressed all of my comments and made revisions where necessary. I recommend this submission to be accepted by JOSS.
Hi @varsha2509,
Thanks for confirming this. Depending on word limit, I'd recommend including a line or two about this in the paper, or in the Github Readme, under a methodology section.
This is a great suggestion! We revised the paper accordingly (018e7e1) and added a methodology section to README.
We are glad to hear that you found our revisions satisfactory, and appreciate your recommendation for acceptance. Thank you again for your thorough review of our submission!
@editorialbot set <DOI here> as archive
@editorialbot set <version here> as version
@editorialbot generate pdf
@editorialbot check references
and ask author(s) to update as needed@editorialbot recommend-accept
@caimeng2 what is left to do (other than the above)? ☺️
@caimeng2 what is left to do (other than the above)? ☺️
I believe only the above. I will have the author tasks done by the end of this week.
Hi @oliviaguest, I finished the author tasks listed above. Please let me know if there's anything else :nerd_face:
- Double check authors and affiliations (including ORCIDs)
Checked
- Make a release of the software with the latest changes from the review and post the version number here. This is the version that will be used in the JOSS paper.
- Archive the release on Zenodo/figshare/etc and post the DOI here.
- Make sure that the title and author list (including ORCIDs) in the archive match those in the JOSS paper.
Checked
- Make sure that the license listed for the archive is the same as the software license.
Checked
@editorialbot generate pdf
@editorialbot check references
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1002/bse.2195 is OK
- 10.21105/joss.05124 is OK
- 10.1016/j.enpol.2008.02.039 is OK
- 10.48550/arXiv.2211.11252 is OK
- 10.1007/s10668-016-9801-z is OK
- 10.3390/ECP2023-14728 is OK
- 10.5040/9781509934058.0025 is OK
- 10.1007/978-981-10-3521-0_31 is OK
- 10.3390/su14053095 is OK
MISSING DOIs
- No DOI given, and none found for title: SDG Auto Labeller
- No DOI given, and none found for title: EUR-SDG-Mapper
- No DOI given, and none found for title: UN-SDG-Classifier
- No DOI given, and none found for title: SDG-Classifier
INVALID DOIs
- None
@editorialbot set 10.5281/zenodo.10854083 as archive
Done! archive is now 10.5281/zenodo.10854083
@caimeng2 why is it Version: v1.2.0
above?
@caimeng2 see: https://github.com/caimeng2/seesus/pull/3 ☺️
@caimeng2 why is it
Version: v1.2.0
above?
Ah my bad. That was the version number for PyPI, which I totally forgot. Should have made them consistent
Hi @oliviaguest, I made a new release and redid the tasks above. Sorry about the inconvenience.
Double check authors and affiliations (including ORCIDs)
Checked
Make a release of the software with the latest changes from the review and post the version number here. This is the version that will be used in the JOSS paper.
Archive the release on Zenodo/figshare/etc and post the DOI here.
Make sure that the title and author list (including ORCIDs) in the archive match those in the JOSS paper.
Checked
Make sure that the license listed for the archive is the same as the software license.
Checked
@caimeng2 thank you!
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@editorialbot set v1.2.1 as version
Done! version is now v1.2.1
@caimeng2 is that the right version?
@caimeng2 is that the right version?
Yes!
@editorialbot recommend-accept
Attempting dry run of processing paper acceptance...
Submitting author: !--author-handle-->@caimeng2<!--end-author-handle-- (Meng Cai) Repository: https://github.com/caimeng2/seesus Branch with paper.md (empty if default branch): Version: v1.2.1 Editor: !--editor-->@oliviaguest<!--end-editor-- Reviewers: @varsha2509, @luyuhao0326 Archive: 10.5281/zenodo.10854083
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@varsha2509, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @oliviaguest know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Checklists
📝 Checklist for @luyuhao0326
📝 Checklist for @varsha2509