openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
700 stars 36 forks source link

[REVIEW]: Jabberwocky: an ontology-aware toolkit for manipulating text #2168

Closed whedon closed 4 years ago

whedon commented 4 years ago

Submitting author: @sap218 (Samantha Pendleton) Repository: https://github.com/sap218/jabberwocky Version: v1.0.0.0 Editor: @majensen Reviewer: @wdduncan, @balhoff Archive: 10.5281/zenodo.3922261

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/10502055165be490feaa389d51fe99d6"><img src="https://joss.theoj.org/papers/10502055165be490feaa389d51fe99d6/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/10502055165be490feaa389d51fe99d6/status.svg)](https://joss.theoj.org/papers/10502055165be490feaa389d51fe99d6)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@wdduncan & @balhoff, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @majensen know.

Please try and complete your review in the next two weeks

Review checklist for @wdduncan

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @balhoff

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 4 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @wdduncan, @balhoff it looks like you're currently assigned to review this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

whedon commented 4 years ago
Reference check summary:

OK DOIs

- None

MISSING DOIs

- https://doi.org/10.1093/database/bau033 may be missing for title: tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles
- https://doi.org/10.3115/v1/p14-5010 may be missing for title: The Stanford CoreNLP natural language processing toolkit
- https://doi.org/10.1093/nar/gkr972 may be missing for title: Disease Ontology: a backbone for disease semantic integration
- https://doi.org/10.1016/j.ajhg.2008.09.017 may be missing for title: The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease
- https://doi.org/10.1093/bib/bbv011 may be missing for title: The role of ontologies in biological and biomedical research: a functional perspective

INVALID DOIs

- None
majensen commented 4 years ago

Quick thing if you can @sap218 - can you add the URLs that whedon-the-bot found above to your refs?

sap218 commented 4 years ago

Currently I have (as an example for tagtog):

@ARTICLE{Cejuela2014-lv,
  title    = "tagtog: interactive and text-mining-assisted annotation of gene
              mentions in {PLOS} full-text articles",
  author = ..........,
  url      = {https://doi.org/10.1093/database/bau033}
}

and it prints as "Retrieved from: https://doi.org/10.1093/database/bau033"

is "url" the correct bibtex key?

@majensen

arfon commented 4 years ago

@sap218 - please change them to keys like this:

doi  = "10.1093/database/bau033"

i.e. change the url to doi and strip off the https://doi.org/ preamble.

majensen commented 4 years ago

Thanks @arfon

sap218 commented 4 years ago

Thanks @arfon - @majensen I believe I have fixed the DOI issue now

arfon commented 4 years ago

@whedon check references

whedon commented 4 years ago
Reference check summary:

OK DOIs

- 10.1093/database/bau033 is OK
- 10.3115/v1/p14-5010 is OK
- 10.1093/nar/gkr972 is OK
- 10.1016/j.ajhg.2008.09.017 is OK
- 10.1093/bib/bbv011 is OK

MISSING DOIs

- None

INVALID DOIs

- None
wdduncan commented 4 years ago

I finished the checklist.

Comments:

  1. I could not install the repo as per the instructions. When running the install script, I received error:
    ERROR: TEST FAILED: /Users/wdduncan/.local/lib/python3.7/site-packages/ does NOT support .pth files 
    error: bad install directory or PYTHONPATH

    This could be b/c I am using OS X, or perhaps b/c I installed using venv.

2.. The examples reference a file named /ontology/pocketmonsters.owl. But this file is not in the repo.

The github documentation would benefit from having examples of expected output and descriptions of what the output means.

The paper refers to the jabberwocky-tests as a way to see the software at work. But that directory contains no instructions.

How long is the software supposed paper to be?

majensen commented 4 years ago

@sap218 please have a look at above issues of @wdduncan @wdduncan - paper can be long or short, as long as it covers the requirements in the "Software paper" section of the review thanks!

danielskatz commented 4 years ago

Well, actually, a goal of JOSS is that papers should be reasonably short. In https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain, we say, "the paper should be between 250-1000 words" though some papers are somewhat longer.

majensen commented 4 years ago

Thanks for the clarification @danielskatz!

sap218 commented 4 years ago

@wdduncan @majensen will get on looking at the issues stated as soon as I can! Thanks.

arfon commented 4 years ago

:wave: @balhoff - just a friendly check-in to see how things are going with your review?

balhoff commented 4 years ago

Thanks for the reminder, @arfon! I can work on this this week.

sap218 commented 4 years ago

@wdduncan @majensen @balhoff I have made some fixes/changes, particularly changes to fix @wdduncan comments - paper.md has been updated too for a little more clarification. Let me know if you need any help!

edit: I can also shorten the paper if need be :-)

sap218 commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

majensen commented 4 years ago

@wdduncan @balhoff Can I get an update from you guys here? I'm hoping we're closing in on it. Thanks!

balhoff commented 4 years ago

Hi @majensen I ran into a blocker with https://github.com/sap218/jabberwocky/issues/10. I reported it in the tool repo, but probably should have clarified here that I was waiting on that before being able to proceed further.

wdduncan commented 4 years ago

Hi @majensen I was waiting to see if the issues referenced by @balhoff and myself had been fixed. I can go ahead and retest, though.

sap218 commented 4 years ago

I'll take a look at those issues @balhoff @wdduncan again - (sorry for delay). Will update here when I can! :-)

sap218 commented 4 years ago

Just an update - I fixed the issues on the repo if you wanted to retry :-) (turns out i did indeed have a missing dependency - really sorry about that)

majensen commented 4 years ago

@balhoff @wdduncan - any movement on checking Sam's work? thanks

balhoff commented 4 years ago

Yep it's working for me now; I'll continue the review.

wdduncan commented 4 years ago

@sap218 I tried installing the software, but I am having issues. Steps so far:

  1. Create venv using python3 -m venv .env, and active environment (works fine).
  2. Run pip install -r requirements.txt (works fine).

But, when I run python3 setup.py install --user, I get message/error:

You are attempting to install a package to a directory that is not
on PYTHONPATH and which Python does not read ".pth" files from.  The
installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:

    /Users/wdduncan/.local/lib/python3.7/site-packages/

and your PYTHONPATH environment variable currently contains:

    ''

I fixed this by removing --user. Perhaps it is related to my using a virtual environment? if so, it would be good to document this.

I ran the jabberwocky-tests.

For the catch test, I am unsure what the output in catch_output.txt means. Did catch find the lines in the blog that contained terms from the ontology?
Some documentation on what the output means would be helpful.

The arise test ran successfully. But, like the catch test, documentation would be helpful for describing the output.

The arise test:

arise -ontology ../ontology/pocketmonsters.owl -tfidf new_synonyms_tfidf.csv $ arise -o ../ontology/pocketmonsters.owl -f new_synonyms_tfidf.csv

failed with error:

Usage: arise [OPTIONS]
Try 'arise --help' for help.

Error: no such option: -t

Hope this helps!

cc @majensen

sap218 commented 4 years ago

@wdduncan Thank you so much! I will improve the documentation for users to understand, I see the problem with the arise failure (I made a minor typo in the documentation, I will fix that also). Thank you again!

majensen commented 4 years ago

Just a qn @wdduncan - is it weird to use --user when installing into a virtual environment? I wouldn't think to do it, since the environment is isolated from everything to begin with. (I only have toddler python chops, however.)

wdduncan commented 4 years ago

@majensen I'm not sure, I usually don't use the --user option with pip, unless I run into permission issues. Since, I was testing the software, I didn't want to affect the library I have. My guess is that in the venv, there isn't a "user" specified. So, --user doesn't make sense. But, I may be wrong about this ... I'm not a virtual environment guru :).

In any case, it is probably best document the behavior.

majensen commented 4 years ago

Ah! Well I believe the virtual environment is truly isolated from both your system install and your user packages. Everything is stored in the venv, including the python binary, pip and everything else. But - better safe than sorry. With python 3.7.6 (in an active venv) I get a more straightforward message: jensenma$ source venv/bin/activate (venv) jensenma$ pip install --user requests ERROR: Can not perform a '--user' install. User site-packages are not visible in this virtualenv.

So I think it's a python thing, not a jabberwocky thing.

wdduncan commented 4 years ago

Yes, you right @majensen!
I didn't mean to imply it was a jabberwocky thing. Only that that the installation directions might need to be updated with info about using venv in case someone tries to install in a virtual environment, gets an error and quits w/o further investigating the error.

sap218 commented 4 years ago

@wdduncan I fixed that arise issue so you should be able to run that command now :-) I will be improving the documentation today

sap218 commented 4 years ago

@wdduncan i have also improved the documentation to include explanations of the outputs, see the SCENARIO.md

majensen commented 4 years ago

@balhoff @wdduncan - how is it coming? @sap218 has put in the work so maybe we can get across the finish line.

wdduncan commented 4 years ago

Sorry ... did get a chance to review the changes this weekend. I will circle back around soon.

balhoff commented 4 years ago

Same here!

wdduncan commented 4 years ago

I did another run through the documentation on github. It is looking much better! I really like the figure at the bottom :)

However, I still have issues/comments:

I looked at the documentation for rdfs:schema But I did not find the annotation properties: rdfs:hasBroadSynonym, rdfs:hasNarrowSynonym, etc.

I also looked at the base IRI for rdf-schema:hasBroadSynonym (i.e., https://www.w3.org/2000/01/rdf-schema) and still not find the aforementioned annotation properties.

Have you considered using SKOS which already has terms for broad match, close match, etc.

sap218 commented 4 years ago

@wdduncan I looked over your comments:

I hope this helps!

wdduncan commented 4 years ago

@sap218

"we are running these command in the jabberwocky-test/process directory".

Sorry. I missed this :)

"-p, --parameter TEXT parameter for the JSON file text."

Yes, I saw that. Sorry, I wasn't clear. What does the post mean? Is it different that post_text? How is it different than -p user-text? E.g.: bite --textfile social_media_posts.json --parameter user-text. Is there a controlled set of terms you use with -p flag?

sap218 commented 4 years ago

@wdduncan

-p refers to the field inside the JSON file for the text data, "-p post_text" and "-p user-text" are just two different examples - sorry about this confusion, i will change the README so all examples only refer to one example.

edit: just fixed this! all example codes now relate back to the scenario.md so users understand - plus I have explained -p better, "with the .json input you need specify the field inside the JSON that contains the textual data to process."

sap218 commented 4 years ago

@wdduncan I have just fixed the annotation issue regarding the synonyms, I fixed the documentation to explain this:

make sure annotations are defined with the oboInOWL: schema, e.g. hasExactSynonym should have the IRI http://www.geneontology.org/formats/oboInOWL#hasExactSynonym.

wdduncan commented 4 years ago

@sap218 That's helpful. I think letting the users define custom relations is fine too. You just have to be clear about what is going on.

Here's a mapping project you might be interested: https://github.com/OBOFoundry/SSSOM

sap218 commented 4 years ago

@wdduncan Great! Thank you! Glad I was able to address your concerns! :-)

majensen commented 4 years ago

@wdduncan i have also improved the documentation to include explanations of the outputs, see the SCENARIO.md

(This is super cool.)

wdduncan commented 4 years ago

The documentation is looking good!

majensen commented 4 years ago

As I indicated to @wdduncan on another channel - just wanted to let all know that, as productive as it may be, the review process can be completed: When the reviewers finish their work, please check all the boxes in the review sheets and give us a heads-up with a comment here in the thread.

majensen commented 4 years ago

@balhoff - I see only one box unchecked in your review, Functionality. Are you close to being satisfied with that aspect?

@wdduncan - You have installion, functionality, and several documentation issue remaining. How are these coming?

This review began on March 19, and while there has been a lot of extra time provided due to covid, I think Sam has done a lot of good work improving the code and the MS. Time to draw a line under it. Thanks --@majensen

wdduncan commented 4 years ago

@majensen I checked boxes for installation and functionality. There aren't automated tests, but there is a tutorial. Are automated tests required?

@sap218 A number of changes have been made to the github pages. Does the manuscript reflect these changes? I haven't had a chance to re-read it yet.