karpathy / reinforcejs

Reinforcement Learning Agents in Javascript (Dynamic Programming, Temporal Difference, Deep Q-Learning, Stochastic/Deterministic Policy Gradients)
1.32k stars 340 forks source link

Reinforcejs VS ConvNetjs #8

Open functionsoft opened 8 years ago

functionsoft commented 8 years ago

Hi,

I'm looking at http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html

and comparing the agent there with the one at

http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html

They are acting in very similar environment, but have different AI implementaitons.

My question is, which is the more advanced and complete AI agent between the two versions?

What are the differences in the neural network implementations and which is more intelligent agent?

Thanks,

Mike

karpathy commented 8 years ago

Hi, both of those agents are using the same algorithm: DQN, but yes the implementation is different on the level of details. I'd use the REINFORCEjs one, it's more recent and complete.

On Fri, Oct 23, 2015 at 9:29 AM, functionsoft notifications@github.com wrote:

Hi,

I'm looking at http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html

and comparing the agent there with the one at

http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html

They are acting in very similar environment, but have different AI implementaitons.

My question is, which is the more advanced and complete AI agent between the two versions?

What are the differences in the neural network implementations and which is more intelligent agent?

Thanks,

Mike

— Reply to this email directly or view it on GitHub https://github.com/karpathy/reinforcejs/issues/8#issuecomment-150626036.

functionsoft commented 8 years ago

Hi,

Thanks for getting back to me. I’m glad you said that, because that’s the library I chose out of the two to work with and understand.

In the learn function of the DQNAgent there is a comment regarding replay memory, about priority sweeps, how could this be simply implemented with the current code? I assume it involves marking the experience memory with some value that represents good experience vs bad experience? So that the best memories are played back?

Also, the type of neural network implemented in this agent, what is it? Is it a simple multilayer perceptron? Would the agent benefit from more hidden layers?

Any ideas or suggestions greatly appreciated.

Thanks and Regards,

Mike

From: Andrej Sent: Friday, October 23, 2015 6:28 PM To: karpathy/reinforcejs Cc: functionsoft Subject: Re: [reinforcejs] Reinforcejs VS ConvNetjs (#8)

Hi, both of those agents are using the same algorithm: DQN, but yes the implementation is different on the level of details. I'd use the REINFORCEjs one, it's more recent and complete.

On Fri, Oct 23, 2015 at 9:29 AM, functionsoft notifications@github.com wrote:

Hi,

I'm looking at http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html

and comparing the agent there with the one at

http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html

They are acting in very similar environment, but have different AI implementaitons.

My question is, which is the more advanced and complete AI agent between the two versions?

What are the differences in the neural network implementations and which is more intelligent agent?

Thanks,

Mike

— Reply to this email directly or view it on GitHub https://github.com/karpathy/reinforcejs/issues/8#issuecomment-150626036.

— Reply to this email directly or view it on GitHub.

mryellow commented 8 years ago

priority sweeps, how could this be simply implemented with the current code? I assume it involves marking the experience memory with some value that represents good experience vs bad experience? So that the best memories are played back?

Seen it done that way in a paper somewhere (can't find it), they added an extra property to the experience objects with a value which was then used to prune experiences.

nosyndicate commented 8 years ago

Hi, mryellow, I am very interesting in the prioritized sweeping with experience replay paper you talk about, can you recall anything that is related to it that I can use to google it?

mryellow commented 8 years ago

Not sure I have it saved here, think it may have been an incomplete draft, and not that interesting otherwise.

They were using ReinforceJS, had modified this bit https://github.com/karpathy/reinforcejs/blob/0b9315a69c55f7d66a9d3839a0a90dd067be45db/lib/rl.js#L1091 to include some kind of crude threshold on a score. Believe it was effectively only really looking for actions with a non-zero reward.

One bit that sticks in my head is they were using a Greek alphabet Rho or Psi or something and had in-line comments with it showing properly encoded rather than LaTex or a substitute simple character.

mryellow commented 8 years ago

google: "this.learnFromTuple(e[0], e[1], e[2], e[3], e[4], e[5])"

On Learning Coordination Among Soccer Agents

http://robocup.csu.edu.cn/web/wp-content/uploads/2012/12/data/pdfs/robio12-116.pdf

mryellow commented 8 years ago

Hangon, only result, but not it, although I've seen this paper before.... and don't think it passed in the score, but checked it before firing learnFromTuple... So that's a wild goose chase, sorry for the noise.

nosyndicate commented 8 years ago

Thanks, mryellow

andrewcz commented 8 years ago

There is a new paper in regards to deep reinforcements learning in continous spaces by deepmind. Continuos control with deep reinforcements learning. Is there plans to add this in code form. Many thanks Andrew.

NullVoxPopuli commented 4 years ago

I'm also curious about the deepmind's learnings :D