openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
725 stars 38 forks source link

[PRE REVIEW]: FreeStylo: An easy-to-use stylistic device detection tool for stylometry #7443

Open editorialbot opened 1 week ago

editorialbot commented 1 week ago

Submitting author: !--author-handle-->@fschncvg<!--end-author-handle-- (Felix Schneider) Repository: https://github.com/cvjena/freestylo Branch with paper.md (empty if default branch): Version: v0.5.0 Editor: Pending Reviewers: Pending Managing EiC: Chris Vernon

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/eb3cc3f453aabe48306c4e81f42a4133"><img src="https://joss.theoj.org/papers/eb3cc3f453aabe48306c4e81f42a4133/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/eb3cc3f453aabe48306c4e81f42a4133/status.svg)](https://joss.theoj.org/papers/eb3cc3f453aabe48306c4e81f42a4133)

Author instructions

Thanks for submitting your paper to JOSS @fschncvg. Currently, there isn't a JOSS editor assigned to your paper.

@fschncvg if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). You can search the list of people that have already agreed to review and may be suitable for this submission.

Editor instructions

The JOSS submission bot @editorialbot is here to help you find and assign reviewers and start the main review. To find out what @editorialbot can do for you type:

@editorialbot commands
editorialbot commented 1 week ago

Hello human, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 week ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.04 s (1475.5 files/s, 194763.2 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
HTML                            22            188              0           3584
Python                          19            382            814           1042
JSON                             4              0              0            349
Markdown                         2             54              0            227
YAML                             1             12              3            112
TeX                              1              9              0             73
Bourne Shell                     2              2              0              7
TOML                             1              1              0              5
-------------------------------------------------------------------------------
SUM:                            52            648            817           5399
-------------------------------------------------------------------------------

Commit count by author:

    41  schneider
     4  Felix Schneider
editorialbot commented 1 week ago

Paper file info:

šŸ“„ Wordcount for paper.md is 1853

āœ… The paper includes a Statement of need section

editorialbot commented 1 week ago

License info:

šŸŸ” License found: GNU General Public License v3.0 (Check here for OSI approval)

editorialbot commented 1 week ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

āœ… OK DOIs

- 10.18653/v1/2021.latechclfl-1.11 is OK
- 10.14746/amup.9788323241775 is OK
- 10.5281/zenodo.1212303 is OK
- 10.18653/v1/2021.acl-demo.3 is OK

šŸŸ” SKIP DOIs

- No DOI given, and none found for title: Metaphor Detection for Low Resource Languages: Fro...
- No DOI given, and none found for title: Twenty-first century Corpus Workbench: Updating a ...
- No DOI given, and none found for title: Empirical research on association measures: The UC...
- No DOI given, and none found for title: NLTK: The Natural Language Toolkit

āŒ MISSING DOIs

- None

āŒ INVALID DOIs

- None
editorialbot commented 1 week ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 1 week ago

Five most similar historical JOSS papers:

TRUNAJOD: A text complexity library to enhance natural language processing Submitting author: @dpalmasan Handling editor: @danielskatz (Active) Reviewers: @mbdemoraes, @apiad Similarity score: 0.6678

Arabica: A Python package for exploratory analysis of text data Submitting author: @PetrKorab Handling editor: @oliviaguest (Active) Reviewers: @linuxscout, @amitkumarj441 Similarity score: 0.6554

Augmenty: A Python Library for Structured Text Augmentation Submitting author: @KennethEnevoldsen Handling editor: @arfon (Active) Reviewers: @sap218, @wdduncan Similarity score: 0.6359

Fast, Consistent Tokenization of Natural Language Text Submitting author: @lmullen Handling editor: @arfon (Active) Reviewers: @arfon Similarity score: 0.6347

textnets: A Python package for text analysis with networks Submitting author: @jboynyc Handling editor: @gkthiruvathukal (Active) Reviewers: @sara-02, @tresoldi Similarity score: 0.6316

āš ļø Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

crvernon commented 1 week ago

:wave: @fschncvg - You mention the following previous publications: Chiasmus Detection: https://aclanthology.org/2021.latechclfl-1.11/ Metaphor Detection: https://aclanthology.org/2022.mwe-1.11/ Additional Detectors so far in this software package: Polysyndeton - Epiphora - Alliterative verse A tool for people in stylometry, literary scholars, computational linguists/NLP researchers.

Please clearly state how / if the current submission is different from the above. Thanks.

fschncvg commented 1 week ago

The previous publications are the base forms of the chiasmus and metaphor detector. While both have code to replicate the experiments and could with some work probably used by someone proficient in programming to also detect those stylistic devices in other texts, this software provides a new implementation of the concept that can be either used as a python library in other programs or be used directly by e.g. literary scholars with no programming knowledge. The additional stylistic devices that can be detected (polysyndeton, epiphora, alliterative verse) do not rely on machine learning and have not been previously published by me, but since this software provides a growing general stylistic device detector collection, it makes sense to include them, especially since they are interesting for various stylometric analysis tasks.

So the main difference is: Two of the previously published papers provide methods that are implemented in this software package. Additionally this package provides three more methods to find other stylistic devices. This software package provides a command line interface and a library for the(currently 5) methods to enable their use by both people proficient in python and researchers with linguistic, stylometric and literary expertise without programming knowledge.

About the similarity to those previously published papers the bot linked: My software package detect stylistic devices. Those linked do not.

@dpalsman published a software that analyzes other aspects of text (e.g. discourse markers, emotions...) and could be used together with my software to gain information the other does not address. We both use spaCy for preprocessing, which makes the interoperability easy. However, my package also provides a cltk backend for Middle High German which can be easily be extended for other classical languages supported by cltk.

@PetrKorab published a software that analyzes various parts of time-series structured text (e.g. news artices, social media posts) for various things, but no stylistic devices. However, my software could be integrated into theirs to also support stylistic devices.

@KennethEnvoldsen provides a software that augments text by e.g. standardising spelling and grammar. It does not compute stylometric information like the choice of words - or of stylistic devices. This could maybe be used as a preprocessing step for my software, however it may destroy the stylistic devices that my software searches for.

@lmullen provides an R package for various tokenization tasks, but no stylistic device detection. It could be used as a preprocessing step, but since it is written in R, the interoperability to my python package is limited.

@jboynyc uses network analysis on collections of text to find out which words are used by different authors and how this connects and groups the authors. While it would be interesting to also do network analysis to find out how different authors use stylistic devices and how they are connected and grouped by that, their software does not use or find stylistic devices and provides a different service than mine.

crvernon commented 6 days ago

@editorialbot query scope

Thank you @fschncvg, I am going to run this through scope review with our larger editorial board as well. I'll get back to you ASAP. Thanks!

editorialbot commented 6 days ago

Submission flagged for editorial review.