smilesun / rlR

Deep Reinforcement Learning in R (Deep Q Learning, Policy Gradient, Actor-Critic Method, etc)
https://smilesun.github.io/rlR
Other
26 stars 4 forks source link

about the first version of the software #5

Open smilesun opened 6 years ago

smilesun commented 6 years ago

@berndbischl @markdumke I think we do not have that much time to wait until a perfect design, I would suggest

How do you two think ?

We could start next week and have some form as soon .

smilesun commented 6 years ago

Hi, not sure why there is no feedback. I think we have the following aims

  1. Makus need a Cran packakge
  2. Markus need a publication
  3. We want a maintainable package to use for future projects

Based on those goals

  1. we should push the cran package as soon as possible, since it always make sense to be the first one, even if we can not make it perfect. I think it will be a big loss for us if it is not the first R package that can do RL ? And the previous version Markus handled as Mater thesis is already enough to push to cran.

  2. We need to use at least the design Markus presented last week for the publication, and in this way I am confident there would not be too many critism

  3. For future use, I myself think I prefer something I could afect the design of the software but this has nothing to do with our current goal. and I think this should not slow down the first two goals, right ? And maybe in somepoint I could merge my branch into Markus Branch or we just keep as it is.

I need some feedback from you two.

smilesun commented 6 years ago

@berndbischl @markdumke

markusdumke commented 6 years ago

Good points! I thought about merging the stuff from the thesis over into the new design I presented to you and then push to CRAN. I probably need about two weeks to do this (cause I cannot work fulltime on this). But of course we can also push the master thesis version right now to CRAN.

berndbischl commented 6 years ago

waiting for a few weeks is ok, but we should really target jan for a first upload

smilesun commented 6 years ago

Thanks for the reply for both :) I am not sure if the following is a good idea.

  1. upload the master thesis version to cran so we are sure that it is the first R package to do complex reinforcement learning but not just MDP. This will be saved in CRAN history anyway. And Cran does not care about the design of the software at all but it matters if it is the first I think(the for the R community, the mater thesis version is already very fantastic for most people). We could do this if there is not too much extra work for Markus?
  2. In January, we have another version for the software and since nobody work during the Christmas time, nobody would be surprised if a lot of things get changed, right? and also the user API could be mostly the same although the inside turns into an object-oriented way.
  3. For the publication, we need
    • Change the design of the software is because we want to reduce the criticism from the reviewer
    • Benchmark over a batch of problems and a batch of algorithms which is needed in the paper.

Basically my point is maybe we could generate the above mentioned output (cran, oo design, benchmark, paper) stagewise so we are safe to deliver those in time. Otherwise maybe it becomes to complex for us to manage this project and we might get disappointed at the end if it lasts too long.

@markdumke @berndbischl , how do you think ? In this way, we could join force and do not waste time doing repetivive work.

markusdumke commented 6 years ago

I think it's a good plan. The master thesis version works, so I can push it to CRAN anytime, that is no problem.

and also the user API could be mostly the same although the inside turns into an object-oriented way.

I think there will be a lot of changes also to the user api. But that doesn't matter too much if it is an improvement in usability.

markusdumke commented 6 years ago

So should I push to CRAN now?

smilesun commented 6 years ago

@berndbischl , what do you think ?

markusdumke commented 6 years ago

@smilesun @berndbischl I have an improved draft for the user interface of the first version online in the reinforcelearn work branch. I would suggest that I finish it this week with at least some basic functionality (Qlearning with or without exp replay, table and keras nn) and then push this to CRAN as version 0.1.0.

For the second version we can then include a better internal oo structure based on Xudong's ideas, and add more algorithms etc., but probably don't have to change too much of the user interface.

# qlearning eligibility traces
env = makeEnvironment("windy.gridworld")
val = makeValueFunction("table", n.states = env$n.states, n.actions = env$n.actions)
policy = makePolicy("epsilon.greedy", epsilon = 0.1)
alg = makeAlgorithm("qlearning", lambda = 0.9, traces = "accumulate")
agent = makeAgent(policy, val, alg)
interact(env, agent, n.episodes = 100L)

# character arguments
env = makeEnvironment("windy.gridworld")
agent = makeAgent("softmax", "table", "qlearning")
interact(env, agent, n.steps = 10L)
smilesun commented 6 years ago

@markdumke , there is nothing in the work branch now except for the Rxd folder I pushed, did you push correctly to the right branch?

markusdumke commented 6 years ago

Ah, yes I've pushed to my own repo: https://github.com/markdumke/reinforcelearn/tree/work

smilesun commented 6 years ago

I think that is a good plan

markusdumke commented 6 years ago

Very good, I will upload tomorrow then

markusdumke commented 6 years ago

Finally done: https://cran.r-project.org/web/packages/reinforcelearn/index.html

smilesun commented 6 years ago

@markusdumke Great work! So what is your plan about the paper now?

markusdumke commented 6 years ago

Thanks :)

I've just started working in a new job, so sadly there's not too much time I can spend on this project right now.

But I think it would be great to merge our code together at some point, so to have a maintainable and extendable package. So maybe you can make a list of changes you'd like to make to the current code at https://github.com/markusdumke/reinforcelearn, so that I can review these maybe on the weekend? And then we can merge and submit to JOSS?