[REVIEW]: niacin: A Python package for data enrichment

whedon commented 4 years ago

Submitting author: @deniederhut (Dillon Niederhut) Repository: https://github.com/deniederhut/niacin Version: 0.2.0 Editor: @oliviaguest Reviewer: @BayesforDays, @sara-02 Archive: 10.5281/zenodo.3875715

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/386b739daf6c23ab4cdc9599ab3761aa"><img src="https://joss.theoj.org/papers/386b739daf6c23ab4cdc9599ab3761aa/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/386b739daf6c23ab4cdc9599ab3761aa/status.svg)](https://joss.theoj.org/papers/386b739daf6c23ab4cdc9599ab3761aa)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@BayesforDays & @sara-02, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @oliviaguest know.

✨ Please try and complete your review in the next two weeks ✨

Review checklist for @BayesforDays

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@deniederhut) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @sara-02

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@deniederhut) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

whedon commented 4 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @BayesforDays, @sara-02 it looks like you're currently assigned to review this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 4 years ago

Reference check summary:

OK DOIs

- None

MISSING DOIs

- https://doi.org/10.18653/v1/d19-1670 may be missing for title: EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
- https://doi.org/10.3390/info11020108 may be missing for title: fastai: A Layered API for Deep Learning
- https://doi.org/10.3115/v1/d14-1162 may be missing for title: GloVe: Global Vectors for Word Representation

INVALID DOIs

- None

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

oliviaguest commented 4 years ago

👋 @BayesForDays and @sara-02 — feel free to leave any comments here you have about the manuscript and if you have code-related comments please use the repository's issues (https://github.com/deniederhut/niacin) and link to them from here. The main bulk of your review involves the list above. 😺

👉 If you can both give a rough ETA for when you can complete your reviews that would be very helpful.

sara-02 commented 4 years ago

wave @BayesForDays and @sara-02 — feel free to leave any comments here you have about the manuscript and if you have code-related comments please use the repository's issues (https://github.com/deniederhut/niacin) and link to them from here. The main bulk of your review involves the list above. smiley_cat

point_right If you can both give a rough ETA for when you can complete your reviews that would be very helpful.

I should have my first set of reviews by March 15th.

oliviaguest commented 4 years ago

@sara-02 great!

oliviaguest commented 4 years ago

@whedon remind @sara-02 in 2 weeks

whedon commented 4 years ago

Reminder set for @sara-02 in 2 weeks

sara-02 commented 4 years ago

@deniederhut Will it be possible to include some sort of feature/architecture comparison between niachin, EDA and noisemix? This is w.r.t to the following parameter required regarding the paper State of the field: Do the authors describe how this software compares to other commonly-used packages? In the current paper, we are only mentioning the following here are a small number of libraries that contain text augmentation functions, for example noisemix and EDA (Wei & Zou, 2019), but these tend to be associated with conference papers and have little documentation, or only implement a small subset of common transforms. I think we can expand on this WDYT?

deniederhut commented 4 years ago

Will it be possible to include some sort of feature/architecture comparison

Sure thing 😄

deniederhut commented 4 years ago

Okay, I have added some more detail about the transformations, and included a table with the two libraries mentioned. Let me know what you think!

arfon commented 4 years ago

Dear authors and reviewers

We wanted to notify you that in light of the current COVID-19 pandemic, JOSS has decided to suspend submission of new manuscripts and to handle existing manuscripts (such as this one) on a "best efforts basis". We understand that you may need to attend to more pressing issues than completing a review or updating a repository in response to a review. If this is the case, a quick note indicating that you need to put a "pause" on your involvement with a review would be appreciated but is not required.

Thanks in advance for your understanding.

Arfon Smith, Editor in Chief, on behalf of the JOSS editorial team.

sara-02 commented 4 years ago

Okay, I have added some more detail about the transformations, and included a table with the two libraries mentioned. Let me know what you think!

Hey, where can I view the updated paper? I was earlier looking at the link provided here https://github.com/openjournals/joss-reviews/issues/2136#issuecomment-594364581

arfon commented 4 years ago

@sara-02 - you can ask @whedon for a preview at any time like this 👇

arfon commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

sara-02 commented 4 years ago

@oliviaguest I think I am done with the first set of reviews, and @deniederhut has already PRs w.r.t changes requested. Once his work gets merged, I can do a final review then. The rest looks good to me, and the project seems to be working as expected.

oliviaguest commented 4 years ago

@sara-02, fantastic and thank you. Stay safe during these troubling times. 🌸

BayesForDays commented 4 years ago

Sorry everyone for the delays in my review. I can wait until PRs are merged and do the review then, or provide my thoughts now?

oliviaguest commented 4 years ago

@BayesForDays same for you — stay safe! Take your time. 🌼

oliviaguest commented 4 years ago

@deniederhut if you can, and no pressure, feel free to give the reviewers some idea of what you plan to do. 🌷

deniederhut commented 4 years ago

Hey! Sorry for holding things up -- things have been a bit hectic. I've merged the PRs with the requested changes, and am presently addressing some tickets that @sara-02 created. Thanks so much for your help Sarah!

@BayesForDays this is ready for review whenever you are. No rush 😄

BayesForDays commented 4 years ago

Hi @deniederhut, I'm checking out the paper right now and I had a couple of questions/comments. One concern is about the wordnet functionality -- and in particular your motivations for using nltk to access wordnet. Do you think it makes sense to specify a specific version of nltk?

On a conceptual level, I think it would be helpful for readers if you could elaborate a bit on the wordnet functions (e.g. add_hypernyms), as well as include a citation about whether these specific types of negative samples are useful for modeling. I personally think the wordform transformation is the most useful one and requires the least explanation, partly because it's also the most analogous to the computer vision work that I'm aware of.

In the paper, it would be helpful to somehow graphically mark the changes that take place when transforming the "original" text into the "augmented" text as well, especially for the hyper/hyponym example.

Finally, I think it makes sense (unless the package is more general) to specify that the language this package assumes is English. If the package can be made to be more general (e.g. different language versions of wordnet) specifying this somehow or allowing that to be passed in as a parameter would be a nice "to-do."

BayesForDays commented 4 years ago

I have a handful of questions about the API, nothing major I think.

Some additional documentation explaining the main modules (e.g. text) would be very helpful. For example, it wasn't clear to me why text has to be imported rather than it being easy to import a function like add_hypernyms directly from niacin. The text import could also be clarified in the documentation and the paper (which seems to assume that the reader has done something like from niacin.text import *).
Perhaps you could add some docstrings that return some information about what functions are contained in the modules as well?

sara-02 commented 4 years ago

Hey @deniederhut I recently came across some more references that you can add to your literature review and introduction if they not already spoken about. https://github.com/makcedward/nlpaug https://github.com/QData/TextAttack Also, an interesting blog giving an overview of the work you aim to achieve: https://amitness.com/2020/05/data-augmentation-for-nlp/

deniederhut commented 4 years ago

Thanks for all the feedback @BayesForDays ! I've been working on adding some new transformations to flesh out the capabilities of the library a bit and those PRs should get merged in soon. I'll turn my attention to the issues you raised next (good catch on the nltk version btw -- it's in requirements.txt but not the setup file 😬). To address this one briefly here:

I think it makes sense (unless the package is more general) to specify that the language this package assumes is English

Totally. Cross-lingual support is hard, because some of these transformations don't really make sense for other languages, or don't have similar resources available. Maybe the best approach here is to have something like from niacin.text import en -- which will have the added benefit of not loading in resources during runtime for a language that's not being used.

And thanks for the references @sara-02 ! That nlpaug library has some interesting stuff in it. I'm not sure I'll be able add similar functionality here in a timely fashion 😅

deniederhut commented 4 years ago

Okay, I think I have addressed the comments here. Let me know what you think!

@sara-02 I noticed that you closed the issue you raised about support for other languages. Do you think it would make sense to leave it open in case someone else has a similar question?

sara-02 commented 4 years ago

@deniederhut This is understandable, I just shared those packages as they seemed good reference point for future work. Oh, I thought we need to close any pending issues from our side for your package publication(not sure about this, but it looks clean that way). I will be more than happy to reopen it once we publish :)

sara-02 commented 4 years ago

@oliviaguest I think I are good with the final reviews from my side. I cloned and ran the tests on the updated package and they passed. @BayesForDays do let us know if you have anything else to add.

oliviaguest commented 4 years ago

@BayesForDays are the last few unticked items above now covered?

@sara-02 thanks — this is great. 😊

BayesForDays commented 4 years ago

Yes, I think everything looks good!

oliviaguest commented 4 years ago

@whedon check references

oliviaguest commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

Reference check summary:

OK DOIs

- None

MISSING DOIs

- https://doi.org/10.18653/v1/d19-1670 may be missing for title: EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
- https://doi.org/10.3390/info11020108 may be missing for title: fastai: A Layered API for Deep Learning
- https://doi.org/10.3115/v1/d14-1162 may be missing for title: GloVe: Global Vectors for Word Representation

INVALID DOIs

- None

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

oliviaguest commented 4 years ago

@deniederhut can you please create an archive for your code (on Zenodo, figshare, or other) and post the archive DOI here as well as check the PDF proof and add the missing DOIs, please? 😊

deniederhut commented 4 years ago

Sure thing! The DOIs should be in there now, and here's the zenodo DOI: http://doi.org/10.5281/zenodo.3875715.

oliviaguest commented 4 years ago

@whedon check references

whedon commented 4 years ago

Reference check summary:

OK DOIs

- 10.18653/v1/D19-1670 is OK
- 10.3390/info11020108 is OK
- 10.3115/v1/D14-1162 is OK

MISSING DOIs

- None

INVALID DOIs

- None

oliviaguest commented 4 years ago

@whedon set 10.5281/zenodo.3875715 as archive

whedon commented 4 years ago

OK. 10.5281/zenodo.3875715 is the archive.

oliviaguest commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

oliviaguest commented 4 years ago

@deniederhut let me know if the final proof looks good and we can seal the deal and publish it. 😊

deniederhut commented 4 years ago

Looks good to me, thanks!

oliviaguest commented 4 years ago

@whedon accept

whedon commented 4 years ago

Attempting dry run of processing paper acceptance...

whedon commented 4 years ago

Reference check summary:

OK DOIs

- 10.18653/v1/D19-1670 is OK
- 10.3390/info11020108 is OK
- 10.3115/v1/D14-1162 is OK

MISSING DOIs

- None

INVALID DOIs

- None

whedon commented 4 years ago

:wave: @openjournals/joss-eics, this paper is ready to be accepted and published.

Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/1468

If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/1468, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true

oliviaguest commented 4 years ago

@openjournals/joss-eics please accept this! 😄

openjournals / joss-reviews