[REVIEW]: ReinforcementLearning: A Package to Perform Model-Free Reinforcement Learning in R

whedon commented 6 years ago

Submitting author: @nproellochs (Nicolas Proellochs) Repository: https://github.com/nproellochs/ReinforcementLearning Version: v1.0.2 Editor: @arfon Reviewer: @terrytangyuan, @kbenoit Archive: 10.5281/zenodo.3244170

Status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/b63a22db024a9f3fea13caed66569912"><img src="http://joss.theoj.org/papers/b63a22db024a9f3fea13caed66569912/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/b63a22db024a9f3fea13caed66569912/status.svg)](http://joss.theoj.org/papers/b63a22db024a9f3fea13caed66569912)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@terrytangyuan & @kbenoit, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @arfon know.

✨ Please try and complete your review in the next two weeks ✨

Review checklist for @terrytangyuan

Conflict of interest

[x] As the reviewer I confirm that I have read the JOSS conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Version: Does the release version given match the GitHub release (v1.0.2)?
[x] Authorship: Has the submitting author (@nproellochs) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Authors: Does the paper.md file include a list of authors with their affiliations?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Review checklist for @kbenoit

Conflict of interest

[x] As the reviewer I confirm that I have read the JOSS conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Version: Does the release version given match the GitHub release (v1.0.2)?
[x] Authorship: Has the submitting author (@nproellochs) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Authors: Does the paper.md file include a list of authors with their affiliations?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

whedon commented 6 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @terrytangyuan, it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

whedon commented 6 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 6 years ago

:point_right: Check article proof :page_facing_up: :point_left:

terrytangyuan commented 5 years ago

@nproellochs Could you clarify the differences between your package and reinforcelearn? Also, have you done any benchmarking or used the package in certain applications yet as I doubt that this package is suitable and performant enough for practical use cases given it's implemented purely in R?

nproellochs commented 5 years ago

@terrytangyuan The key differences between our package and the reinforcelearn package are as follows: 1) Different from the reinforcelearn package, our package performs batch reinforcement learning (it allows to learn the optimal behavior of an agent based on sample sequences consisting of states, actions and rewards). Batch reinforcement learning drastically reduces the computation time in most settings. 2) Our package is shipped with the built-in capability to sample experience from a function that defines the dynamics of the environment. 3) Our package incorporates a variety of learning algorithms to improve the performance of (batch) reinforcement learning.

We also tested the suitability of the package in multiple application scenarios. For example, our paper "Negation scope detection in sentiment analysis: Decision support for news-driven trading" makes use of the package to learn negations based on reinforcement learning (the paper is available from https://www.sciencedirect.com/science/article/pii/S0167923616300823).

terrytangyuan commented 5 years ago

@nproellochs By "performance", you meant the statistical performance, right? What about computational performance and benchmarks? RL is mostly popular in large scale applications but your package doesn't seem to support that.

nproellochs commented 5 years ago

@terrytangyuan In typical application scenarios, batch reinforcement learning also speeds up computational performance as it mitigates the ‘exploration overhead’ problem in pure online learning. In combination with experience replay, it speeds up convergence by collecting and replaying observed state transitions repeatedly to the agent as if they were new observations collected while interacting with the system. An overview of the method can be found in our paper (https://arxiv.org/pdf/1810.00240.pdf). Nonetheless, due to the fact that the package is written purely in R, I agree that its applicability to very large scale problems (such as applications in computer vision) is limited.

arfon commented 5 years ago

:wave: @terrytangyuan @kbenoit - happy new year! How are you getting along with your reviews?

terrytangyuan commented 5 years ago

@nproellochs Thanks for the clarification. I understand your point but I don't see how useful and practical this software can be if it cannot be applied to the most popular RL scenarios (mostly due to computational performance overhead). I am uncertain how valuable this is to publish in JOSS. @arfon How do we go from here? Any thoughts @kbenoit?

kbenoit commented 5 years ago

Hi - Sorry for the delay. I'm currently on holidays but back to work on Jan 23 and happy to complete my review then.

In the meantime: On the performance issue, this is really a question for the package and its authors to demonstrate. If there is a vignette or demonstration in the JOSS paper showing that despite being all R, that it can be useful for some applications, then I consider this a satisfactory answer to the question. Alternatively, the case could be made that it's a demonstration or learning package for RL for R users. Only if it's impractical for nearly any application because of performance issues would it be inappropriate as an R package or for JOSS.

arfon commented 5 years ago

@nproellochs Thanks for the clarification. I understand your point but I don't see how useful and practical this software can be if it cannot be applied to the most popular RL scenarios (mostly due to computational performance overhead). I am uncertain how valuable this is to publish in JOSS. @arfon How do we go from here? Any thoughts @kbenoit?

@terrytangyuan - this is a fair point. We have a sister journal JOSE which may be a better fit for this package if @kbenoit is of the same opinion as @terrytangyuan here.

Hi - Sorry for the delay. I'm currently on holidays but back to work on Jan 23 and happy to complete my review then.

That would be great. Thanks!

arfon commented 5 years ago

Friendly reminder to pick up this review sometime soon please @kbenoit

kbenoit commented 5 years ago

Sorry for the delay guys, here are my comments. All of my suggestions are below but on the JOSE option above, I think it has to do below with the purpose of the package and the need it addresses, which I think needs additional clarification before we can answer this definitively. Do I think the package should be rejected from JOSS because it cannot solve large-scale computer vision problems? No, not unless that is what it claims to solve. So for me, a key unanswered question is exactly what need it claims to meet. See more below.

Checklist issues

Does the release version given match the GitHub release (v1.0.2)?

Not exactly, as the GitHub repo has not tagged releases. Suggest you use this very useful facility. https://help.github.com/articles/creating-releases/. I can verify that the current master branch on the GitHub repo matches the submitted and the CRAN version.

Functionality: Have the functional claims of the software been confirmed?

I find this uncertain, since in the vignette, there is no actual execution of the code, and no interpretation or discussion of the results or the format in which they are delivered. How I am to understand the output of policy(model), which is a named character vector such as

> head(policy(model))
.XXBB..XB XXBB.B.X. .XBB..BXX BXX...B.. ..XB..... XBXBXB... 
     "c1"      "c5"      "c5"      "c4"      "c5"      "c9"

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

None exist in the repo or in the README.md, but these would be easy to add.

Software paper

None of the three checklist items are met in the current version of the paper. See below for my comments on the statement of need and the purpose of the package.

Core issues relating to package's purpose

It's not entirely clear what the core purpose of the package is, given the performance limitations acknowledged above. Either this is a practical solution for applying reinforcement learning to problems that people need to solve, in which case there needs to be a discussion o of its performance limitations and some comparison to other possible solutions, in the paper and in my opinion, in the package vignette and/or repository README. The current statement of purpose (in the paper) suggests that the package exists to provide an R package for model-free learning, implying that its purpose is to provide a practical tool for applying this model to a broad class of problems. If so, then some discussion of performance is needed.

Or, this is a purely R-based implementation of reinforcement learning that offers a way to test or take apart this class of machine learning model for primarily educational purposes. If this is the case, then JOSE is more appropriate.

Either way, but especially in the second scenario, more complete documenation is needed, since any user is going to need to:

understand the required input format for supplying their own data; and
understand the output format and how to interpret it.

The man pages cite Sutton and Barto (1998). Why not implement one of the examples from that text using the package, as a vignette? Especially if this is primarily a learning tool, this would be really useful, but even if designed as a problem-solving tool, it will greatly help users connect the tool to a textbook that more fully explains parameter choices and provides further examples.

Some options are never explained at all, such as iter, which defaults to 1. When would higher values be useful?

Minor issues

Not every manual entry has examples, and often has very little discussion. Why not flesh this out, especially if the purpose is educational?

Admittedly, I am a bit obsessed with naming, but I find the function names unintuitive and inconsistent, either non-descriptive (e.g. policy) or inconsistent: compounded nouns (e.g. epsilonGreedyActionSelection) or verb-nouns (e.g. lookupLearningRule). Not clear either which are data objects and which are functions.

The R code is nicely written and very readable - so if the purpose is educational then it will serve that well, for those looking at source code. Although: please add a space after if.

R CMD CHECK issues

On the a full check, I am seeing:

N checking top-level files Non-standard files/directories found at top level: ‘JOSS’ (add to .Rbuildignore)
Warning: roxygen2 requires Encoding: UTF-8 (add this to DESCRIPTION)

arfon commented 5 years ago

:wave: @nproellochs - please let me know when you've incorporated @kbenoit's feedback.

arfon commented 5 years ago

@nproellochs - could you give us an update on your progress?

kbenoit commented 5 years ago

Sorry to be slow on this, review by early next week at the latest. Been snowed under with something else but it's done now.

nproellochs commented 5 years ago

@arfon Sorry for the delay. I will give an update on this ASAP. Thank you @kbenoit & @terrytangyuan for the review and the valuable feedback.

arfon commented 5 years ago

@arfon Sorry for the delay. I will give an update on this ASAP. Thank you @kbenoit & @terrytangyuan for the review and the valuable feedback.

:wave: @nproellochs - friendly reminder to get back to this soon.

terrytangyuan commented 5 years ago

@arfon My feedback and opinion still remain unchanged for the reasons I mentioned earlier. I'd recommend @nproellochs submitting this article to other journals, e.g. JOSE mentioned above.

arfon commented 5 years ago

@arfon My feedback and opinion still remain unchanged for the reasons I mentioned earlier. I'd recommend @nproellochs submitting this article to other journals, e.g. JOSE mentioned above.

Understood. Thanks @terrytangyuan.

danielskatz commented 5 years ago

👋 @arfon - this review seems to have stalled...

nproellochs commented 5 years ago

@arfon I am planning to incorporate the feedback next weekend. Is there a deadline for the revision?

arfon commented 5 years ago

@arfon I am planning to incorporate the feedback next weekend. Is there a deadline for the revision?

No, the timing is up to you.

Please make sure though in your revisions you directly address the feedback from @kbenoit. Specifically:

Or, this is a purely R-based implementation of reinforcement learning that offers a way to test or take apart this class of machine learning model for primarily educational purposes. If this is the case, then JOSE is more appropriate.

In your response here I think you need to make a strong case for this being a JOSS submission rather than a JOSE one, especially given both reviewers have the same concern here.

nproellochs commented 5 years ago

Thank you again @kbenoit & @terrytangyuan for reviewing the package. As part of the revision, I have addressed the checklist issues as follows:

Version: Does the release version given match the GitHub release (v1.0.2)?

I added a tagged GitHub release matching the CRAN version (v1.0.4).

Functionality: Have the functional claims of the software been confirmed?

I have significantly improved the package documentation. In addition, the vignette now contains two working examples (adapted from Sutton (1998)) with code executions and detailed explanations of the data format and model outcomes.

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?

I added a statement of need section to the paper. In addition, the paper (and the package vignette) now contains a dedicated section addressing the performance of the package. Specifically, I added examples of things the package can do and the things the package cannot do (where users may consider implementations written in "faster" programming languages)

Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).

As aforementioned, I implemented two working examples in the package vignette. Specifically, I demonstrate the use of the package by drawing upon common examples from the literature (e.g. finding optimal game strategies).

Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?

See points above. In addition, I added further documentation and additional examples to the manual entries.

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

I added community guidelines in the README.

Software paper

I added a statement of need highlighting our contributions to the existing landscape of R packages. In addition, the paper now includes the package authors' affiliations and DOIs.

Minor issues

I added further documentation and additional examples to the manual entries. In addition, I improved the consistency of function naming and added spaces after if.

R CMD CHECK issues

Fixed.

nproellochs commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 5 years ago

PDF failed to compile for issue #1087 with the following error:

/app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-379fcd917139/lib/whedon.rb:115:in check_fields': Paper YAML header is missing expected fields: authors, affiliations (RuntimeError) from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-379fcd917139/lib/whedon.rb:80:ininitialize' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-379fcd917139/lib/whedon/processor.rb:36:in new' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-379fcd917139/lib/whedon/processor.rb:36:inset_paper' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-379fcd917139/bin/whedon:55:in prepare' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/command.rb:27:inrun' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/invocation.rb:126:in invoke_command' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor.rb:387:indispatch' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/base.rb:466:in start' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-379fcd917139/bin/whedon:116:in<top (required)>' from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in load' from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in

'

nproellochs commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

arfon commented 5 years ago

Thanks for the update @nproellochs. @terrytangyuan @kbenoit - could you take a look at the updated paper ☝️and let me know if this satisfies your concerns about the usefulness of this software given both of your concerns about performance?

As a reminder, our submission requirements state that:

The software should be a significant contribution to the available open source software that either enables some new research challenges to be addressed or makes addressing research challenges significantly better (e.g., faster, easier, simpler)

From my perspective, if this is the first reinforcement learning package in R, then that could be a 'substantial contribution' almost by definition, however, if the performance is so poor that its not useful for all but a very small number of applications, then this may be a moot point.

labarba commented 5 years ago

👋 @terrytangyuan, @kbenoit — We are waiting for your assessment on this submission. Thanks!

terrytangyuan commented 5 years ago

Please see my previous comment.

labarba commented 5 years ago

I see that both reviewers have concerns about the purpose and need of this software… https://github.com/openjournals/joss-reviews/issues/1087#issuecomment-453893878 https://github.com/openjournals/joss-reviews/issues/1087#issuecomment-461296889

... and @arfon's assessment is:

if this is the first reinforcement learning package in R, then that could be a 'substantial contribution' almost by definition, however, if the performance is so poor that its not useful for all but a very small number of applications, then this may be a moot point.

The author says they have expanded in the paper the sections on Statement of Need, and Notes on Performance.

Does this new text in the paper satisfy the reviewers?

terrytangyuan commented 5 years ago

Then see my very first comment https://github.com/openjournals/joss-reviews/issues/1087#issuecomment-443436417, it is not the first reinforcement learning package in R.

nproellochs commented 5 years ago

@terrytangyuan Apart from the abovementioned differences regarding package features, I want to point out that the reinforcelearn R package was released approximately 11 months after the ReinforcementLearning package. Therefore, in my opinion, it is fair to claim that the ReinforcementLearning package is the first package that performs model-free RL in R.

terrytangyuan commented 5 years ago

That is not my main concern... @arfon @labarba Please un-assign me from this and assign new reviewers if needed. I don't have additional time dedicated to this review anymore. My apologies for any inconvenience this may cause.

nproellochs commented 5 years ago

Thank you very much for your feedback up to this point @terrytangyuan

arfon commented 5 years ago

@terrytangyuan @kbenoit - many thanks for all of your input to date and your excellent reviews. While there isn't clear agreement here between us, I'm satisfied with the responses of the author and am making the editorial decision to accept this submission into JOSS.

arfon commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

arfon commented 5 years ago

@nproellochs - At this point could you make an archive of the reviewed software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:

The title of the archive is the same as the JOSS paper title
That the authors of the archive are the same as the JOSS paper authors

I can then move forward with accepting the submission.

nproellochs commented 5 years ago

Thank you @arfon and both reviewers! The archive is now available on Zenodo 10.5281/zenodo.3244170.

arfon commented 5 years ago

@whedon set 10.5281/zenodo.3244170 as archive

whedon commented 5 years ago

OK. 10.5281/zenodo.3244170 is the archive.

arfon commented 5 years ago

@whedon accept

whedon commented 5 years ago

Attempting dry run of processing paper acceptance...

whedon commented 5 years ago


OK DOIs

- 10.1111/ecog.00888 is OK
- 10.1038/nature14236 is OK
- 10.1109/TNN.1998.712192 is OK

MISSING DOIs

- https://doi.org/10.1109/21.364859 may be missing for title: A Sensor-Based Navigation for a Mobile Robot using Fuzzy Logic and Reinforcement Learning
- https://doi.org/10.1006/game.1997.0619 may be missing for title: Coordination, “Magic,” and Reinforcement Learning in a Market Entry Game
- https://doi.org/10.1007/978-3-642-27645-3_2 may be missing for title: Batch Reinforcement Learning
- https://doi.org/10.1007/978-1-4615-3618-5_5 may be missing for title: Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching
- https://doi.org/10.1007/978-1-4757-6451-2_4 may be missing for title: Reinforcement Learning in the Multi-Robot Domain
- https://doi.org/10.1145/1143844.1143929 may be missing for title: Reinforcement Learning for Optimized Trade Execution
- https://doi.org/10.2139/ssrn.2602003 may be missing for title: Negation Scope Detection in Sentiment Analysis: Decision Support for News-Driven Trading
- https://doi.org/10.3115/1289189.1289246 may be missing for title: Automatic Learning of Dialogue Strategy using Dialogue Simulation and Reinforcement Learning

INVALID DOIs

- None

whedon commented 5 years ago

Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/749

If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/749, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true

openjournals / joss-reviews