openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
694 stars 36 forks source link

[REVIEW]: PDF Entity Annotation Tool (PEAT) #5336

Open editorialbot opened 1 year ago

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@chrstahl<!--end-author-handle-- (Christopher Stahl) Repository: https://github.com/USEPA/peat Branch with paper.md (empty if default branch): paper Version: v1.0.1 Editor: !--editor-->@atrisovic<!--end-editor-- Reviewers: @flor14, @RuneBlaze Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/e9f377be673455715a5986a5c4fb7204"><img src="https://joss.theoj.org/papers/e9f377be673455715a5986a5c4fb7204/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/e9f377be673455715a5986a5c4fb7204/status.svg)](https://joss.theoj.org/papers/e9f377be673455715a5986a5c4fb7204)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@bioedd & @puruckertom, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @atrisovic know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @RuneBlaze

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 year ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.25 s (331.8 files/s, 264565.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CSS                             30           8087            173          34723
JavaScript                      37           3936           1820          14807
JSON                             5              0              0            496
XML                              1              0              0            494
Markdown                         3            150              0            270
TeX                              1             12              0             99
Jupyter Notebook                 3              0            207             66
HTML                             1              3             20             20
SVG                              1              0              0              7
-------------------------------------------------------------------------------
SUM:                            82          12188           2220          50982
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 1 year ago

Wordcount for paper.md is 1560

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1093/nar/gkaa333 is OK
- 10.1117/12.2005608 is OK
- 10.1016/j.envint.2021.107025 is OK

MISSING DOIs

- None

INVALID DOIs

- None
atrisovic commented 1 year ago

👋🏼 @chrstahl @bioedd, @puruckertom this is the review thread for the paper. All of our communications will happen here from now on.

As a reviewer, the first step is to create a checklist for your review by entering

@editorialbot generate my checklist

as the top of a new comment in this thread.

These checklists contain the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#5336 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@atrisovic) if you have any questions/concerns.

bioedd commented 1 year ago

Review checklist for @bioedd

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

atrisovic commented 9 months ago

Hi @bioedd & @puruckertom, I hope you had a nice summer. I'm writing to see how is the review going and if you need help with anything. I suggest running @editorialbot generate my checklist directly in the issue on GitHub as the formatting for the markdown checklist is not supported from email. Thank you!

arfon commented 9 months ago

:wave: @atrisovic – I think the authors might need a follow-up over email here as things look to have gone rather stale here.

atrisovic commented 8 months ago

Hi @arfon I agree, things are going pretty slow.

atrisovic commented 7 months ago

Hi All, I've sent out several reminders and emails, but there hasn't been any progress. I'll now reach out to potential new reviewers to move the things forward.

atrisovic commented 7 months ago

Hi @flor14 and @gasparl,
 I am writing to invite you to review a paper titled "PDF Entity Annotation Tool (PEAT)", which is currently in the review process at the Journal of Open Source Software (JOSS). Your insights and feedback on this work would be valuable and much appreciated. To help with the review, we have a reviewers' checklist with useful guidelines. 
Thank you! 
Ana Trisovic, JOSS Topic Editor

gasparl commented 7 months ago

Hi, thanks for the invitation! I'd accept it (though it's not really my topic), but I only have Linux with me, and the current PEAT doesn't seem to have ready Linux builds. If you can prepare that, let me know, otherwise I'll pass, sorry.

flor14 commented 7 months ago

Hello I can be a reviewer but currently, I am an ARM Mac user. Even if it is possible to run the software on my computer it would consume me extra time to make it work. I wonder if I will not be a more useful reviewer for another project if this one is Windows only.

chrstahl commented 7 months ago

It works on Linux and Mac (ARM or non ARM). Just have to build using node/yarn, can't put the executable on git without paying for apple to sign the application so it won't work to share a prebuilt.

flor14 commented 7 months ago

Thank you for your clarification @chrstahl. I will have a look before Dec 1st then.

atrisovic commented 7 months ago

Fantastic, thank you @flor14

atrisovic commented 7 months ago

@editorialbot add @flor14 as reviewer

editorialbot commented 7 months ago

@flor14 added to the reviewers list!

flor14 commented 7 months ago

@chrstahl, I have been checking your software PEAT. Thank you for your submission. I have written notes below some of the checklist points, I hope you can find my review useful.

Review checklist - @flor14

Conflict of interest

Code of Conduct

General checks

Comment: Yes, there is an MIT License but I wonder if there is a reason why the year and author are not included on it. The first line is not there: https://choosealicense.com/licenses/mit/

This item doesn't apply - [ ] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item. This item doesn't apply- [ ] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item. This item doesn't apply - [ ] Human and animal research: If the paper contains original data research on human subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

Notes: I have to make some changes to install and run the software. 1 - When I try to run it I receive some messages about breaking changes. As the app was not working, I ran npm audit fix. These packages were updated:

    "foreman": "^3.0.1",
    "jest-puppeteer": "^9.0.1",
    "electron": "^27.1.0",

There have been multiple updates for Electron since 2021. The version you used (11.0.1) was the first one that supported ARM Mac computers, so that was not an issue as you said before.

  1. I had to add contextIsolation: false in public/electron.js to be able to run it. This could be linked to updates in the version of Electron, but I would check if this is the appropriate fix to leave the app publically available. This was the error: https://stackoverflow.com/questions/56091343/typeerror-window-require-is-not-a-function

After 1 and 2, the app worked on my Mac M2 computer.

ERROR
Cannot read properties of undefined (reading 'canvas')
TypeError: Cannot read properties of undefined (reading 'canvas')
    at findAnnotate (http://localhost:3000/static/js/bundle.js:114:23)
    at onClick (http://localhost:3000/static/js/bundle.js:1209:25)
    at HTMLUnknownElement.callCallback (http://localhost:3000/static/js/bundle.js:120521:18)
    at Object.invokeGuardedCallbackDev (http://localhost:3000/static/js/bundle.js:120565:20)
    at invokeGuardedCallback (http://localhost:3000/static/js/bundle.js:120613:35)
    at invokeGuardedCallbackAndCatchFirstError (http://localhost:3000/static/js/bundle.js:120627:29)
    at executeDispatch (http://localhost:3000/static/js/bundle.js:120703:7)
    at executeDispatchesInOrder (http://localhost:3000/static/js/bundle.js:120725:9)
    at executeDispatchesAndRelease (http://localhost:3000/static/js/bundle.js:123257:9)
    at executeDispatchesAndReleaseTopLevel (http://localhost:3000/static/js/bundle.js:123264:14)

Documentation

Note: In the repo is described that the app can be used with Linux and Mac but it only exemplifies its use with Windows. If it is not possible to provide PEAT executable versions for Linux and Mac it would be helpful to guide more to the users about what they should do if they are on Windows and what they should do if they are on Mac/Linux. The README.md now lists the commands without more context.

My way of thinking about the level of detail of the documentation is keeping in mind who are your users and the potential level of knowledge you expect them to have. To use an annotation tool such as PEAT, should they know node and yarn? If the answer is no it would be beneficial to be a bit more verbose about what they should do to make PEAT run. In general, describing steps can help, for example, 1- clone this repository, 2- Download Node and Yarn (links), .3 - ...

Also, a comment about the repository: there is no title in the README.md. I would add a title and place the instructions about how to install the app under the title "Installation instructions" or similar. I think that can help with the clarity of the documentation.

Software paper

chrstahl commented 7 months ago

@flor14 Thank you for the feedback! I should be able to address these pretty quickly. Looks like the bug you encountered also happens on mine, so it wasn't a breaking change from updating all the libraries. I'll update the packages and shore up the installation documentation. For the license I had it auto added, not sure why it missed the date/name but can fix that.

I'll also see if I can attach a Linux prebuilt version like there is for Windows. Unfortunately for Mac if I attach the unsigned prebuilt on git it won't work on other peoples machines.

Will try to get through this before Christmas and I'll ping you when there are updates.

atrisovic commented 5 months ago

Hello @chrstahl and @flor14, I hope all is well. It seems we are getting close regarding the first review. Please share your updates and let us know if you need any additional support. Thanks for your efforts!

atrisovic commented 4 months ago

@editorialbot commands

editorialbot commented 4 months ago

Hello @atrisovic, here are the things you can ask me to do:


# List all available commands
@editorialbot commands

# Add to this issue's reviewers list
@editorialbot add @username as reviewer

# Remove from this issue's reviewers list
@editorialbot remove @username from reviewers

# Get a list of all editors's GitHub handles
@editorialbot list editors

# Assign a user as the editor of this submission
@editorialbot assign @username as editor

# Remove the editor assigned to this submission
@editorialbot remove editor

# Remind an author, a reviewer or the editor to return to a review after a 
# certain period of time (supported units days and weeks)
@editorialbot remind @reviewer in 2 weeks

# Check the references of the paper for missing DOIs
@editorialbot check references

# Perform checks on the repository
@editorialbot check repository

# Adds a checklist for the reviewer using this command
@editorialbot generate my checklist

# Set a value for version
@editorialbot set v1.0.0 as version

# Set a value for branch
@editorialbot set joss-paper as branch

# Set a value for repository
@editorialbot set https://github.com/organization/repo as repository

# Set a value for the archive DOI
@editorialbot set set 10.5281/zenodo.6861996 as archive

# Mention the EiCs for the correct track
@editorialbot ping track-eic

# Generates the pdf paper
@editorialbot generate pdf

# Recommends the submission for acceptance
@editorialbot recommend-accept

# Generates a LaTeX preprint file
@editorialbot generate preprint

# Flag submission with questionable scope
@editorialbot query scope

# Get a link to the complete list of reviewers
@editorialbot list reviewers

# Creates a post-review checklist with editor and authors tasks
@editorialbot create post-review checklist

# Open the review issue
@editorialbot start review
atrisovic commented 4 months ago

@editorialbot remove @bioedd from reviewers

editorialbot commented 4 months ago

@bioedd removed from the reviewers list!

atrisovic commented 4 months ago

@editorialbot remove @puruckertom from reviewers

editorialbot commented 4 months ago

@puruckertom removed from the reviewers list!

atrisovic commented 4 months ago

By the looks of it, it seems we have only one active reviewer here. I will look for reviewer 2. @chrstahl it would be helpful to have an update on your last comment, so we can finalize the first review.

chrstahl commented 4 months ago

@atrisovic Understood, I got delayed more by the holidays and the recent snow storm than I had intended. Still working through the list but plan on having changes to push sometime next week.

atrisovic commented 4 months ago

Hi @RuneBlaze, I am writing to kindly invite you to review a paper titled "PDF Entity Annotation Tool (PEAT)", which is currently in the review process at the Journal of Open Source Software (JOSS). Your insights and feedback on this work would be valuable and much appreciated. To help with the review, we have a reviewers' checklist with useful guidelines. Thank you! Ana Trisovic, JOSS Topic Editor

chrstahl commented 4 months ago

@atrisovic Since you just posted, I am finishing this up the first review fixes at this moment. Will ping when they accept the updates in a merge request.

RuneBlaze commented 4 months ago

Hi @atrisovic . I will happily do the review throughout the next two weeks (I aim to finish before Feb 26). If that sounds good I can start with my parts of the review process.

atrisovic commented 4 months ago

@RuneBlaze sounds wonderful!

atrisovic commented 4 months ago

@editorialbot add @RuneBlaze as reviewer

editorialbot commented 4 months ago

@RuneBlaze added to the reviewers list!

atrisovic commented 4 months ago

@chrstahl sounds great! In any case, we will need two reviewers, so that is why I looped in @RuneBlaze

RuneBlaze commented 4 months ago

Review checklist for @RuneBlaze

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

RuneBlaze commented 4 months ago

@chrstahl I just completed my first round of reviews. First of all, writing software that interacts with PDF files can be a pain at times, so congratulations on achieving this technical mark here. I created two issues in the repo that I think will be great for inviting usage and contribution as an open source software:

  1. https://github.com/USEPA/peat/issues/3
  2. https://github.com/USEPA/peat/issues/2

One minor writing suggestion:

While text mining approaches – including Deep Learning (DL), Artificial Intelligence (AI), and Machine Learning (ML) - continue to expand at a rapid pace, the tools used by researchers to create the labeled datasets required for training, modeling, and evaluation remain rudimentary.

DL is usually a specific kind of ML, so rephrasing will be appreciated.

Other issues:

  1. https://github.com/USEPA/peat/issues/4 (yarn start did not work on my setup: pointers will be appreciated)
  2. Adding the instructions for running tests will be appreciated (I think yarn test might be it; but was not sure)
  3. Also it might be helpful to add some description (or even formal specs) of the exported JSON -- third party tools can benefit from a well-defined schema.
chrstahl commented 4 months ago

@RuneBlaze Looking through your list but solved the biggest issue that stuck out to me, yarn start failing on windows.

I have a running pull request going, but currently I can't personally push to the EPA's git so have to wait for approvals. For working through these issues quicker it might be worthwhile to just look at my personal fork at https://github.com/chrstahl/peat . I have made some changes to the Readme already in my fork, but will continue going through your comments. I did just push a fix for the Windows issue, just tested it out on my Win 11 machine.

RuneBlaze commented 4 months ago

@chrstahl Thanks for the quick response! Let me test the personal fork tonight (US central time) and update the review :) . Obviously it will be great to have things merged into the version submitted for review, but I also want to do my part in getting the review process going.

atrisovic commented 3 months ago

Hi @flor14, @RuneBlaze, @chrstahl thank you for all the activity in the GitHub repository! Let me know how things are going and if I can help with anything :-)

chrstahl commented 3 months ago

@RuneBlaze I changed around the Readme, not exactly a TOC but I think it fixes the problem of not being to easily see the different sections. Also cleaned up all the comments/prettified everything based on your comments. Looks like they did approve the pull but didn't actually merge it so reaching back out.

For testing, there is not currently any automatic testing. The user guide explains the step by step how to test functionality though.

I'm going to add a definition for the output json tomorrow. I do agree with your comments that showing some publications of the tool would be useful. Unfortunately the current space I have been using it for I am not able to publish, so don't currently have any. Though the general concept of what I have been doing is talked about in the paper, labeling document sections to create training/test sets for ML tools that automatically bin documents by subjects.

RuneBlaze commented 3 months ago

Hey @chrstahl . Thanks for the reply! Let me check back tonight and update my checklist and check the issues -- and I appreciate the effort.

My suggestions might be specific (and heavily informed by my background, where we try to publish all work we do and do automated testing), but they are intended to be generic to hopefully improve the work, and I do understand that research software work differently across institutions/disciplines.

RuneBlaze commented 3 months ago

@chrstahl Sorry for the delay (I said I would check two days ago.. Welp at least I got back in the weekend). Thanks for the README update -- I updated my checklist to reflect your personal fork. I am still leaving the issues in the USEPA/peat up because the issues reflect the status of the submitted repo -- I will happily close them after the fixes get merged.

For me at least, the only two blockers are:

  1. Merging everything into the submitted repo
  2. The only writing suggestion I had for the paper (copied from my comment above):

One minor writing suggestion:

"While text mining approaches – including Deep Learning (DL), Artificial Intelligence (AI), and Machine Learning (ML) - continue to expand at a rapid pace, the tools used by researchers to create the labeled datasets required for training, modeling, and evaluation remain rudimentary."

DL is usually a specific kind of ML, so rephrasing will be appreciated.

cc: @atrisovic

chrstahl commented 3 months ago

@editorialbot generate pdf

editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

chrstahl commented 3 months ago

@RuneBlaze No problem. Everything has been merged and I reworded that section and merged the MD update. Generated a new PDF above. I think that covers all of your blocks.

@flor14 All of your code fixes and suggestions should also be full incorporated in the repo linked to the project now also.

RuneBlaze commented 3 months ago

Thanks @chrstahl ! To me everything looks great now, and the checklist has all been ticked.

Congrats on the submission again, and thanks for listening to my suggestions.

chrstahl commented 3 months ago

@RuneBlaze Thank you very much! Appreciate it.

@atrisovic Let me know if you need anything else, looking like we are pretty far along now.