Question: Is there a qlearn or similar capability?

ml5js / ml5-library

Friendly machine learning for the web! 🤖

https://ml5js.org

Other

6.45k stars 906 forks source link

Question: Is there a qlearn or similar capability? #1022

Closed NullVoxPopuli closed 4 years ago

NullVoxPopuli commented 4 years ago

I've been using reinforce.js, but it only allows one hidden layer of neurons, but it has qlearn, which is a reinforcement learning algorithm, afaict.

Does ml5 have something similar (any reinforcement learning algo will do)? I've been wanting to use something modern and kept up to date. :)

Context: I'm trying to train an ANN by playing a game, and defining a reward for each move it makes

joeyklee commented 4 years ago

@NullVoxPopuli - Hi! Super cool question. This is a very challenging part of ML that we've not really tackled with ml5.

We have an example of a genetic algorithm -- this has not been released yet, but is rather in our development branch.

And you might also have a look at @AidanNelson 's reinforcement learning example in tensorflow/unity: https://www.aidanjnelson.com/projects/unity-to-tensorflow-js/

Additional examples in the QLearning space using tensorflowjs can be seen here:

NullVoxPopuli commented 4 years ago

Thanks for the resources! I'll read them soon / today :D

Something I've been struggling with understanding as I research this, is that the idea of delayed gratification seems either hard to grasp from existing code-examples, or many reinforcement algorithms don't deal with a game-space complex enough where the ending reward is more important than the individual rewards (or sometimes deliberate lack of immediate reward) or the q-learning and derivitive algorithms are supposed to explore that space via occasional randomness in their moves / outputs (and that's tricky, because too many random moves in a game like 2048, and you have a score ceiling of ~ 3100 (a winning score is I think over 200k)).

So, idk who's an expert at this sort of thing (I'm not, just a hobby atm), but finding a good way to explain how delayed gratification works, how it's found, how you know it won't be found, and you need to start over after tweaking some parameters, would be good <3

joeyklee commented 4 years ago

@NullVoxPopuli - Interesting. I don't personally know a lot about this area of ML, but I'm sure there are definitely people out there in the tensorflowjs universe who might know more. I suppose that most examples in this space are to designed to be more digestible as opposed to super deep dives. Though I'm sure it is a matter of time before more work on this emerges!