yr4000 / Slither-ML-bot

4 stars 1 forks source link

Parameters #1

Open Dimensionic opened 7 years ago

Dimensionic commented 7 years ago

Hi, Project is missing "parameters" folder, could you add it, please? :)

yr4000 commented 7 years ago

Hi :), In a day or two we will upload a code that generates the default parameters, but in the meantime you can create the files manually by looking at the code and see what's missing. May I ask why are you interested in our project?

Dimensionic commented 7 years ago

Nice :) I got it working by trying to guesstimate the missing variables. But I don't have a lot of time to dedicate for it, just to get it half-decent. It would be nice to run it, knowing it's running as original author(s) intended it to, in its current state anyway.

Recent developments in machine learning have caught my attention and I'd just like to play around with it a little. I've been using/working on some bots before, and I enjoy observing and tweaking stuff, then observing it working better, hopefully. Tweaking something to tweak itself is... another awesome layer to it.

Your project just happens to be what caught my interest, potentially being a good starting point to familiarize myself with it, I guess.

yr4000 commented 7 years ago

Well then thank you for your interest. It is very flattering. Our due day is on this Sunday, so by Monday the code will be more organized and we will also upload a manual to explain how to work with it. Only recently we managed to get significant results that looks like the bot starts to learn (after a day or two of learning, depend on the learning mode).

Anyway I will upload today a code that auto generates the parameters.

Good luck with your studies and explorations :)

Dimensionic commented 7 years ago

Thank you for the the parameter generating script 👍

I ran it over-night, and didn't really see any real progression just yet. I thought it would be faster, to be honest. I thought it would/should at least grasp the concept of chasing the 'food'. But that's okay :) Knowing that it took 1-2 days for you. And the fact I ran it with parameters sucked out of a thumb, that just seemed to work.

May I ask, what was the fastest model to learn for you? Maybe even, if it's not too much, you could even share your trained weights file?

Also - on a side note, the score div is 16th, not 17th child element, at least for me. Since it might be something that's left unnoticed on the repo, you might want to take a glance at it :)

yr4000 commented 7 years ago

Thanks for the feedback. About the 16/17 div - we know about that issue but we don't know what the cause for it. for the most of us it worked with 17 and that's why it's like that in the repository. We will mention it in the manual.

In our implementation It should take a long time for the bot to learn - we are not sure why yet. In the Atari article (https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf) as far as we checked it took millions of observations until the model converged to get the optimum results. If I am not mistaken they could emulate the games to run faster, while In this case we have to actually let the bot play the game in a human speed, and that might be one of the reasons it takes so long. Another reason is that this game is very "noisy" and random, i.e. if you are in a state and do some action, it doesn't guaranty the next state can be predicted - the foods are not ordered on the map in some obvious order, an enemy snake might appear unexpectedly etc... And of course, it is possible we are not using the right models or our implementation is not good.

However, not everything is dark - lately we got encouraging result for our imitation learning experiment (which can be made using IL mode and than run a test on the obtained weights), but we still need to check it. We will be smarter by Monday. If we did managed to learn, It might be an evident that the model we wrote can learn from zero (since theoretically a DQN should converge to the optimum given enough observations. This sentence is probably not accurate so be cautious...).

So right now the best model to try is the DQN model, but we are still reconditioning it. I don't see a fast way to learn this game unless we can emulate it some how, and since it's with humans it's not possible. It is possible to first train the model with other bots, and then send it to play with humans, but that is beyond the scope of our project.

Here is an excellent guide about DQN, which we based our code on: http://www.danielslater.net/2016/03/deep-q-learning-pong-with-tensorflow.html

Another model we looked up but didn't have time to implement is the A3C model. you can read about it here: https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2

Dimensionic commented 7 years ago

Thanks for all the information :) One more question though, is it something you will continue working on after the deadline at all?

yr4000 commented 7 years ago

Hi, Unfortunately probably not. I love this project but I need to move on. Maybe I will let the code we managed to write run for a week or something, to see if it really did manage to learn the optimal parameters for the game. Anyway we will make sure to leave this code organized and with a manual for anyone who would want to use it.