openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: SHED: Streaming Heterogeneous Event Data Tracking with Provenance #3119

Closed whedon closed 2 years ago

whedon commented 3 years ago

Submitting author: !--author-handle-->@CJ-Wright<!--end-author-handle-- (Christopher Wright) Repository: https://github.com/xpdAcq/SHED Branch with paper.md (empty if default branch): joss-paper Version: 0.7.5 Editor: !--editor-->@arfon<!--end-editor-- Reviewers: @remram44, @Midnighter Archive: Pending

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/89f9042e51bad3e80b14e88d0fdadc73"><img src="https://joss.theoj.org/papers/89f9042e51bad3e80b14e88d0fdadc73/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/89f9042e51bad3e80b14e88d0fdadc73/status.svg)](https://joss.theoj.org/papers/89f9042e51bad3e80b14e88d0fdadc73)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@remram44, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @arfon know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Review checklist for @remram44

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @remram44 it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 3 years ago
Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.18 s (219.0 files/s, 57135.0 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          25            771            827           3276
reStructuredText                 4            212             28            330
DOS Batch                        1             34              2            239
make                             1             32              5            204
YAML                             4             11              7            154
Jupyter Notebook                 2              0           3928             79
Markdown                         1              4              0             30
INI                              1              0              0              2
-------------------------------------------------------------------------------
SUM:                            39           1064           4797           4314
-------------------------------------------------------------------------------

Statistical information for the repository 'b7d42d849242488e7a4a546b' was
gathered on 2021/03/17.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Christopher J. Wrigh           100          2633           1171            6.18
Julien L                         6            63             12            0.12
Julien Lhermitte                13            94             35            0.21
XPD Operator                     2            27              6            0.05
christopher                    388         30281          26985           93.06
st3107                          11           107            122            0.37

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Julien Lhermitte              4            4.3         35.8               25.00
christopher                4775           15.8         21.9                8.67
st3107                       95           88.8          1.0                0.00
whedon commented 3 years ago

PDF failed to compile for issue #3119 with the following error:

Can't find any papers to compile :-(

arfon commented 3 years ago

@remram44 – This is the review thread for the paper. All of our communications will happen here from now on.

Please read the "Reviewer instructions & questions" in the first comment above.

Both reviewers have checklists at the top of this thread (in that first comment) with the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/3119 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please make a start well ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.

arfon commented 3 years ago

@whedon generate pdf from branch joss-paper

whedon commented 3 years ago
Attempting PDF compilation from custom branch joss-paper. Reticulating splines etc...
whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

remram44 commented 3 years ago

(@arfon: I see the pre-review issue has been closed, but we are missing another reviewer; is that protocol?)

remram44 commented 3 years ago

I have taken a first look at the repositories. Some notes:

I will look into the code next.

remram44 commented 3 years ago

I also note that the main contributors of the code are not watching the GitHub repository, which will probably cause problems in the case of external contributions or reports.

whedon commented 3 years ago

:wave: @remram44, please update us on how your review is going (this is an automated reminder).

remram44 commented 3 years ago

I did :thinking:

sbillinge commented 3 years ago

Thanks for your hard work on this @remram44 . You are right, that somehow I wasn't watching the repo and I didn't get a notification so I missed all your hard work. I am now watching. Sorry, this is the first time I did this process at JOSS. We will work on the suggestions. @CJ-Wright Looping you in. I will do what I can. There are some things we may want to chat about.

arfon commented 3 years ago

:wave: @alesarrett – you be willing to review this submission for JOSS? We carry out our checklist-driven reviews here in GitHub issues and follow these guidelines: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html

We already have one reviewer (@remram44) and are looking to find a second.

The package under review is SHED: Streaming Heterogeneous Event Data Tracking with Provenance, a Python tool for streaming data from event models, while tracking data provenance and workflow execution.

sbillinge commented 3 years ago

Dear Arfon,

I would be willing to help with reviewing, but this is my paper, so it is not so appropriate for me to review it!

S

On Sun, Jun 27, 2021 at 4:59 AM Arfon Smith @.***> wrote:

👋 @alesarrett https://github.com/alesarrett – you be willing to review this submission for JOSS? We carry out our checklist-driven reviews here in GitHub issues and follow these guidelines: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html

We already have one reviewer @.*** https://github.com/remram44) and are looking to find a second.

The package under review is SHED: Streaming Heterogeneous Event Data Tracking with Provenance https://github.com/xpdAcq/SHED, a Python tool for streaming data from event models, while tracking data provenance and workflow execution.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/3119#issuecomment-869127342, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAOWUPAYVWEAZZVSCHQVR3TU3SALANCNFSM4ZKDLD5A .

-- Simon Billinge Professor, Columbia University Physicist, Brookhaven National Laboratory

arfon commented 3 years ago

I would be willing to help with reviewing, but this is my paper, so it is not so appropriate for me to review it!

Of course! Unless I'm mistaken somehow, I was actually inviting @alesarrett to review here.

arfon commented 3 years ago

I will look into the code next.

@remram44 – just checking in to see how you're getting on completing your review?

remram44 commented 3 years ago

Thanks @arfon, I must say I haven't been prioritizing this since we are missing the second reviewer. I was also hoping some of my points in https://github.com/openjournals/joss-reviews/issues/3119#issuecomment-809781527 would be addressed, but that doesn't block me from running the software so I will do that ASAP.

danielskatz commented 3 years ago

👋 @arfon - Can you provide an update on this submission and review?

arfon commented 3 years ago

@CJ-Wright @sbillinge – I'm struggling to find a second reviewer for this submission. Could you take a look at this spreadsheet and identify some possible candidates?

Also, if there are people in your field who you think would be suitable, please mention them here too.

alesarrett commented 3 years ago

wave @alesarrett – you be willing to review this submission for JOSS? We carry out our checklist-driven reviews here in GitHub issues and follow these guidelines: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html

We already have one reviewer (@remram44) and are looking to find a second.

The package under review is SHED: Streaming Heterogeneous Event Data Tracking with Provenance, a Python tool for streaming data from event models, while tracking data provenance and workflow execution.

Sorry to be so late in replying. Anyway, I'm not able to hep with this review, sorry.

arfon commented 2 years ago

Documenting some additional activity on my part here: I've just emailed @CJ-Wright & @sbillinge to ask them:

  1. To respond to @remram44's initial review feedback above
  2. To assist with finding a second reviewer.
arfon commented 2 years ago

@CJ-Wright & @sbillinge – it's close to 8 months since we've heard back from you about this submission. At this point I'm assuming you're no longer interested in pursuing this submission in JOSS.

If we don't hear back from you in the next week, I'll reject this submission due to lack of activity.

sbillinge commented 2 years ago

Sorry about this Arfon. My student CJ left the group with things in this state. I will check with him how we wants to proceed but he will have to commit to spending some time on it for it to work out.

S

On Fri, Feb 18, 2022 at 3:34 PM Arfon Smith @.***> wrote:

@CJ-Wright https://github.com/CJ-Wright & @sbillinge https://github.com/sbillinge – it's close to 8 months since we've heard back from you about this submission. At this point I'm assuming you're no longer interested in pursuing this submission in JOSS.

If we don't hear back from you in the next week, I'll reject this submission due to lack of activity.

— Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/3119#issuecomment-1045144122, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAOWUKACEBL4DYIA7PHFP3U32UNDANCNFSM4ZKDLD5A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

-- Simon Billinge Professor, Columbia University Physicist, Brookhaven National Laboratory

sbillinge commented 2 years ago

Dear Arfon,

OK, so my current student Songsheng @st3107 who is also a contributor and maintainer of the code, and a coauthor on the paper, has agreed to take over from CJ to shepherd this home. We took a look at where everything is and from our perspective the things requested in @remram44's review so far are excellent suggestions and all quite do-able.

For a second reviewer, if it is not a confilct of interest, I think a person from the NSLS-II facility at Brookhaven National Lab where this code is deployed and in use at the XPD and PDF beamlines would be best. The best person there would be Dan Olds (dolds@bnl.gov) as he is in charge of the software at PDF and very familiar with the code (though of course we work with him which may be a conflict issue). Other possibilities would be people who are familiar with the bluesky/event-model (https://github.com/bluesky/event-model) that SHED is built on, or bluesky in general (https://github.com/bluesky/) but not affiliated with BNL.

sbillinge commented 2 years ago

Dear @arfon and @remram44 . Thanks for your review and your patience. We have now responded to all the issues that were raised in @remram44 's initial review. I will post our responses below.

The second issue @arfon raised was to help finding a second referee. Do you want suggestions of names or do you want me to reach out directly to people? Do you have input on the conflict of issue aspects I raised above? The best balance would presumably be someone who is familiar with the event document-model but is not at Brookhaven National Lab. I am now collecting good names from the BNL developers and I can paste them below.

sbillinge commented 2 years ago

Here are responses to the @remram44 review points from above:

  • [x] General: shed is taken on PyPI, by https://github.com/Zac-HD/shed. It might be beneficial to use a unique name, such as shed-data, shed-streaming, etc to avoid conflicting in people's environments. Should the other shed decide to publish on conda-forge, things will get weird (I was surprised this was not discussed there

Shed conda-forge/staged-recipes#4804).

_We changed the project name to shed-streaming and the module name to shed_streaming. The new package is released in the conda-forge channel._

  • [x] Installation instructions: Installing from the repo is not documented, only using the package from conda-forge. I think this should be added if people are to contribute or fork the project. I was able to install from the requirements/ files, but noted that pip.txt is empty, which is probably not intended?

We deleted the pip.txt since it is not used in the installation. All dependencies are installed using conda.

We add a bash script for users to install the package using the source repo and a section about how to use it in the installaion instructions.

  • [x] Automated tests: the tests don't run for me, possibly because pytest-tornado is the wrong version. I note that you use an additional Conda channel on Travis, nsls2forge, which seems to be your own.

The issue is probably related to the dask version. We updated the tests so that the tests used asyncio together with the latest dask. Now, tests could be run without errors. Clone the repo and run the following commands to run the tests locally.

bash install.sh
conda activate <name of the environment>
python run_tests.py

The nsls2forge is a channel that used to be maintained by the NSLS-II DSSI group. Now, the shed (renamed as shed-streaming) with all its dependencies are released on the conda-forge channel.

We included the link to the travis build. Our group runs out of the credits and thus the CI cannot be run now. We have applied for more credits

  • [x] Functionality documentation: The documentation page https://xpdacq.github.io/SHED/shed.html shows up as empty except for headers. It looks like documentation exists and there is just something wrong with autodoc.

We fixed the sphinx autodoc and published the new webpage.

  • [x] Documentation: The rest of the documentation is pretty sparse. There is little explanation of the purpose of the software; maybe some text can be copied from the paper..

We copied the content in the paper into the introduction section of the documentation and it showed on the webpage.

  • [x] Tutorials: They build on NSLS-II data standards and the bluesky framework, which might not be widely known/used outside of the Brookhaven National Laboratory. It also uses rapidz, which is a fork of streamz from the same group. The paper mentions streamz but rapidz is actually what's used throughout the package, except for one example file (average.py).

We changed the streamz to rapidz in the examples.

Grammar fixes have been merged.

  • [x] References / State of the field: Maybe more references could be added to modern streaming dataflow system/models such as Timely dataflows which are popular recently? In addition, I think more details on what SHED adds compared to dask and streamz (that it builds upon) or ray might be interesting?

We added a reference to Naiad: A Timely Dataflow System.

What functionalities that SHED adds to the rapidz (forked from streamz) and dask are:

sbillinge commented 2 years ago

Here are names of potential second reviewers:

arfon commented 2 years ago

:wave: @dylanmcreynolds @untzag @HarinarayanKrishnan – you be willing to review this submission for JOSS? We carry out our checklist-driven reviews here in GitHub issues and follow these guidelines: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html

We already have one reviewer (@remram44) and are looking to find a second.

The package under review is SHED: Streaming Heterogeneous Event Data Tracking with Provenance, a Python tool for streaming data from event models, while tracking data provenance and workflow execution.

Midnighter commented 2 years ago

Randomly saw the call on Twitter. I'm not familiar with Bluesky but I'm working on a similar project myself so I'm happy to review this. My project is for internal use only, no intention to publish or sell so I don't think there's a conflict of interest. I will be very busy in the next two weeks but after that I can start.

arfon commented 2 years ago

:zap: thanks @Midnighter!

arfon commented 2 years ago

@editorialbot add @Midnighter as reviewer

editorialbot commented 2 years ago

@Midnighter added to the reviewers list!

arfon commented 2 years ago

@Midnighter – many thanks for agreeing to review this submission! The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let me know. Once you're ready to start your review, please type the following into a new comment in this thread which will generate a reviewer checklist for you to work through:

@editorialbot generate my checklist
arfon commented 2 years ago

:wave: @Midnighter – just checking in here. I think you said you wouldn't be able to start your review for a couple of weeks?

Midnighter commented 2 years ago

Yeah, I will start in the coming week.

arfon commented 2 years ago

:wave: @Midnighter – just checking in to see how you're getting long here?

arfon commented 2 years ago

And update for this thread, I've heard over email from @Midnighter that they plan to compete their review in the next week.

Midnighter commented 2 years ago

Review checklist for @Midnighter

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Midnighter commented 2 years ago

@whedon generate pdf

editorialbot commented 2 years ago

My name is now @editorialbot

Midnighter commented 2 years ago

@whedon generate pdf from branch joss-paper

editorialbot commented 2 years ago

My name is now @editorialbot

Midnighter commented 2 years ago

@editorialbot generate pdf from branch joss-paper

editorialbot commented 2 years ago

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@editorialbot commands

Midnighter commented 2 years ago

@editorialbot set joss-paper as branch

editorialbot commented 2 years ago

Done! branch is now joss-paper

Midnighter commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Midnighter commented 2 years ago

Contribution and authorship: I asked for quick feedback on a potential fourth author.

Example usage: While there are tutorials that help showcase the software's purpose, the examples seem rather contrived. The documentation could be improved by a real-world case, even though I won't be able to repeat that, since I can't deploy all the required hardware and software systems.

Automated tests: Tests do exist but I was currently unable to run them. I also was unable to find a build on Travis CI that would show me typical test output. So I'm actually in doubt if the package is continuously tested.

Community guidelines: Seem to be missing entirely, as also shown on the community standards page.

State of the field: In my opinion, the authors could provide a better overview here.

References: Although I'm not an expert on streaming data, the very minimalist representation of the state of the field makes me think that probably other important work needs to be cited.