Closed whedon closed 3 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @linuxscout it looks like you're currently assigned to review this paper :tada:.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Wordcount for paper.md
is 824
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1162/jmlr.2003.3.4-5.993 is OK
- 10.1080/02664763.2021.1919063 is OK
- 10.1162/089976601750264965 is OK
- 10.3115/v1/W14-3110 is OK
- 10.1162/15324430260185574 is OK
- 10.21105/joss.02507 is OK
- 10.17875/gup2020-1338 is OK
- 10.13140/2.1.2393.1847 is OK
- 10.17875/gup2020-1338 is OK
MISSING DOIs
- 10.5260/cca.199178 may be a valid DOI for title: IEEE Xplore Digital Library
INVALID DOIs
- 10.5555/1953048.2078195 is INVALID
Software report (experimental):
github.com/AlDanial/cloc v 1.88 T=0.08 s (676.8 files/s, 52936.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Python 15 462 658 1030
XML 6 0 0 213
reStructuredText 17 123 77 211
diff 5 42 49 141
HTML 2 15 0 134
TeX 1 14 0 114
PowerShell 1 49 245 104
make 2 24 6 75
DOS Batch 3 23 2 65
Markdown 1 14 0 56
Jupyter Notebook 1 0 430 40
INI 1 4 3 16
YAML 2 1 2 16
-------------------------------------------------------------------------------
SUM: 57 771 1472 2215
-------------------------------------------------------------------------------
Statistical information for the repository '702fdb33409d86c5e788a30d' was
gathered on 2021/09/11.
The following historical commit information, by author, was found:
Author Commits Insertions Deletions % of changes
AFThielmann 13 444 295 7.51
Anton Thielmann 21 1256 449 17.33
ArneTillmann 172 4166 2990 72.72
kantg 3 198 40 2.42
tkneib 1 1 1 0.02
Below are the number of rows from each author that have survived and are still
intact in the current revision:
Author Rows Stability Age % in comments
AFThielmann 81 18.2 4.4 16.05
Anton Thielmann 823 65.5 2.1 6.20
ArneTillmann 1218 29.2 2.5 17.08
kantg 28 14.1 0.7 3.57
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@linuxscout – This is the review thread for the paper. All of our communications will happen here from now on.
Please read the "Reviewer instructions & questions" in the first comment above.
Both reviewers have checklists at the top of this thread (in that first comment) with the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.
The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/3719
so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.
We aim for the review process to be completed within about 4-6 weeks but please make a start well ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.
Hi @linuxscout, we are very much looking forward to the review and your comments. thanks.
@whedon add @pps121 as reviewer
OK, @pps121 is now a reviewer
@pps121 - thanks for agreeing to review this submission for us! Please take a look at my instructions above and complete your checklist as you work through your review.
@arfon fantastic, thank you! @linuxscout @pps121 should you have any early ideas or comments, on how to improve the paper or make things more clear, please just let us know, and we will implement them as soon as we can. thanks a lot.
I'am not assigned to this issue, to perform review, please assign me.
@linuxscout thanks for the update! @arfon could you please help with this question "I'am not assigned to this issue, to perform the review, please assign me." thank you!
I'am not assigned to this issue, to perform review, please assign me.
@linuxscout – did you accept the invitation at https://github.com/openjournals/joss-reviews/invitations ?
@whedon re-invite @linuxscout as reviewer
OK, the reviewer has been re-invited.
@linuxscout please accept the invite by clicking this link: https://github.com/openjournals/joss-reviews/invitations
ok, thanks.
Hi, I finished the review.
@linuxscout thanks a lot for being that quick with the review. We really appreciated your comments and the fast review process!
@pps121 we are already looking forward to your comments. Please let us know if anything is unclear :) thanks!
Hi, @pps121 how are things going with the review? Please let us know if anything is unclear. thanks!
FYI I just emailed @pps121 to see when they might be able to complete their review by.
Thank you @arfon! We are looking forward to your feedback @pps121.
I just heard back from @pps121 and they are committed to completing their review soon, but are currently busy with school/university commitments.
Great, thank you both @arfon and @pps121!
When I ran the code: from nltk.corpus import reuters
it gave me LookupError as below :
Resource reuters not found. Please use the NLTK Downloader to obtain the resource:
import nltk nltk.download('reuters')
scraped_documents = audo.get_ieee("https://ieeexplore.ieee.org/search /searchresult.jsp?newsearch=true& queryText=cotton&highlight=true& returnFacets=ALL&returnType=SEARCH& matchPubs=true&rowsPerPage=100& pageNumber=1\", pages=1)
Here , there should not a slash() after pageNumber=1, otherwise it throws syntax error.
For the above LookUpError Below code gives a fix:
import nltk nltk.download('reuters')
It downloads a reuters zip inside /nltk_data/corpora, then reloads to execute the cell will proceed to next step.
Before executing the line preprocessed_target = audo.text_cleaning(data=data, column="text")
We need to load wordnet as below otherwise there will be a module not found runtime error
import nltk nltk.download('wordnet')
Dear @pps121, thank you very much for your helpful and valuable comments. We will fix these issues as quickly as possible and get back to you soon!
In .travis.yml, the yaml expose a severe security hole: the username and password are mentioned as below: username: ArneTillmann password: 2AQeUe5iHHEe0MrEv7
Gitignore should take care of it before merging.
The naming convention of ipython notebook should be more meaningful rather than example.ipynb
@ChrisW09 Can you update why number of topics is mentioned as 5? Methods lda_modeling() and lda_visualize_topics() take long time to execute. Can you please share your thoughts.
@pps121 thank you very much for the comments! We will get back to you as soon as possible.
Hi @pps121 first of all, thank you for taking your time and reviewing our work. I will reply to all you comments now!
The naming convention of ipython notebook should be more meaningful rather than example.ipynb
I changed the name to usage_example.ipynb. Is that better, or do you have something else in mind?
@ChrisW09 Can you update why number of topics is mentioned as 5? Methods lda_modeling() and lda_visualize_topics() take long time to execute. Can you please share your thoughts.
The number of topics in the lda model needs to be fixed before the process. This is users' choice. We chose five here because it yields to results that are easy to interpret. Regarding the slow execution time, I could not reproduce that issue. On my machine it took less than 10 seconds to fit the model and produce the visualization. Can you try again, or try in a different enviroment?
When I ran the code: from nltk.corpus import reuters
it gave me LookupError as below :
Resource reuters not found. Please use the NLTK Downloader to obtain the resource:
import nltk nltk.download('reuters')
Thank you for this advice. I added those lines to the usage_example.ipynb
Before executing the line preprocessed_target = audo.text_cleaning(data=data, column="text")
We need to load wordnet as below otherwise there will be a module not found runtime error
import nltk nltk.download('wordnet')
as well as those.
* The code level steps should be more elaborated, so that before executing it can give a high level purpose of its use. * In .travis.yml, the yaml expose a severe security hole: the username and password are mentioned as below: **username**: ArneTillmann **password**: 2AQeUe5iHHEe0MrEv7
Gitignore should take care of it before merging.
Thank you especially for this hint. I didn't notice, but I changed the password and encrypted it now. However, I don't really know which files or lines of code you think should be more elaborated. Can you specify what you would prefer here?
The naming convention of ipython notebook should be more meaningful rather than example.ipynb
I changed the name to usage_example.ipynb. Is that better, or do you have something else in mind?
It it better now. Thanks.
The naming convention of ipython notebook should be more meaningful rather than example.ipynb
I changed the name to usage_example.ipynb. Is that better, or do you have something else in mind?
``
* The code level steps should be more elaborated, so that before executing it can give a high level purpose of its use. * In .travis.yml, the yaml expose a severe security hole: the username and password are mentioned as below: **username**: ArneTillmann **password**: 2AQeUe5iHHEe0MrEv7
Gitignore should take care of it before merging.
Thank you especially for this hint. I didn't notice, but I changed the password and encrypted it now. However, I don't really know which files or lines of code you think should be more elaborated. Can you specify what you would prefer here?
Thanks for the encryption step. it is okay now about the current state of the code modules.
Below are my thoughts and suggestions:
Should include more examples of how to use the software thinking from cross domain horizons (to solve by real-world problems)
For different OS variants, can you include steps for both bash and PowerShell so that general users with different OS can find this more useful?
Should clearly mention, how this packages are better than other baselines / commonly-used packages to decide state of the field.
Thank you.
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@pps121 thank you again for your comments and advise.
I updated the paper to address your first comment "Should include more examples of how to use the software thinking from cross domain horizons (to solve by real-world problems)" by including the following explanations:
Hence, AuDoLab has a broad range of scientific research or business real world applications. In the following, a few potential use cases will be briefly discussed that should illustrate the broad range of applications in various domains. For example AuDoLab could be used to identify emails with very specific topics such as fraud or money laundering that might have an extremely low prevalence. Similarly, AuDoLab could be used in the medical field to classify medical documents that are concerned with very specific topics such as heart attacks or dental problems. Furthermore, AuDoLab may be used to identify legal documents with very specific topics such as machine learning. Note that, the only limiting factor to the broad range of use cases, is the availability of out-of-domain training data, that can be generated via Web Scraping from IEEEXplore [@IEEE], arxiv or pubmed. Given that a broad range of training documents can be obtained from these websites AuDoLab has a correspondingly broad range of applications.
@pps121 Regarding your second comment "For different OS variants, can you include steps for both bash and PowerShell so that general users with different OS can find this more useful?".
We tested our package on various OS variants, namely macOS, unix and windows.
We will add explanations for the installation for both bash and PowerShell. This will make the package more useful for the general user as you suggested. Thank you for these suggestions!
@pps121 Thank you for your comment "Should clearly mention, how this packages are better than other baselines / commonly-used packages to decide state of the field."
In the "Comparison with existing tools" section in our paper we point out that there are no other baselines / commonly-used packages, because AuDoLab is based on the statistical methodology that was recently developed by us in this publication:
Thielmann, A., Weisser, C., Krenz, A., & Säfken, B. (2021). Unsupervised document clas- sification integrating web scraping, one-class SVM and LDA topic modelling. Journal of Applied Statistics, 1–18. https://doi.org/10.1080/02664763.2021.1919063
We now updated the "Comparison with existing tools" to elaborate more on this point:
"At the moment no Python Package with a comparable functionality of AuDoLab is available, since AuDoLab is based on a novel and recently published classification prodcedure [@Thielmann]. Thereby, AuDoLab uses and integrates in particular a combination of Web Scraping, Topic Modelling and One-class Classifcation for which various individual packages are available. Details on the statistical methodology can be found in [@Thielmann]. An application of the methodology on a data set of patent data can found in [@Thielmann2021]. For Topic Modelling available packages are the LDA algorithm as implemented in the package Gensim [@rehurek_lrec] or the package TTLocVis [@Kant2020] for short and sparse text. Visual representations of the topics can be implemented with LDAvis [@ldavis]. The One-class SVM classification package is availabe in Scikit-learn [@scikit-learn]. Alternative Further research could explore Deep Learning Algorithms as well [@Saefken2020; @Saefken2021]."
Please let us know, if this fits with your expectations or what else we should change. Thank you very much for your advise and help!
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Submitting author: @ArneTillmann (Arne Matthias Tillmann) Repository: https://github.com/ArneTillmann/AuDoLab Version: v1.0.7 Editor: @arfon Reviewers: @linuxscout, @pps121 Archive: 10.5281/zenodo.5575835
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@linuxscout, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @arfon know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Review checklist for @linuxscout
✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
@linuxscout, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @arfon know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Review checklist for @pps121
✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper