mpSchrader / gym-sokoban

Sokoban environment for OpenAI Gym
MIT License
322 stars 76 forks source link

Success Rate? #11

Open dikke opened 6 years ago

dikke commented 6 years ago

Hello

I am amazed by your work. I am wondering if you tested the Sokoban's game on the standard RL method (Q learning, A2C, ec), and wondering if you have success rate for this kind of game?

mpSchrader commented 6 years ago

Hey,

Currently I do not have any reliable success rates my self. I would recommend reading Deep Minds paper about Imagination Augmented Agents. In there they represanted results and compared them to a baseline RL algorithm. The I2A architecture solved over 80% percent. In a very computational expansive configuration they were able to solve over 90%.

During a class project we implemented the architecture, but we were not able to replicate the results due to limited computational power.

mpSchrader commented 6 years ago

This weekend I will have a look the openAIs baselines repo and check on how to make gym-sokoban usable in with the repo. After that I will upload some baselines in the documentation.

Olloxan commented 6 years ago

Hey, I am currently trying to implement your sokoban environment for the I2A Agent for my Master Thesis. I am using this repo as a starting point. So far parallelizing of sokoban environments seems to work. If you need any hints for your baseline implementation I suggest having a look into the parallel implementation if the pacman environment in the I2A implenentation

mpSchrader commented 6 years ago

@Olloxan Thanks for the hint. I was planning to first run the simple subpackages from https://github.com/openai/baselines#subpackages-1 as a very first baseline and later on try more advanced techniques. ;-)

Maybe you could add your results of your I2A implementation. By the way it would be great if you could share the results of your thesis in the end. ;-)

wrongbattery commented 5 years ago

i found that env.reset() takes at least 14 seconds to create new game, if we play 100k game, it takes at least 400h for training

mpSchrader commented 5 years ago

Hi @wrongbattery,

This is currently due to the Level Generation, which generates always solvable environment. The generation algorithm uses a Depth-First-Search to generate the room, by reverse playing the initial room. This algorithm is based on the DeepMind paper linked in the readme file. If you have an idea how to improve the algorithm, please let me know and I will implement it. ;-)

Best, Max

Olloxan commented 5 years ago

Hi, in order to use this environment for the Imagination Augmented agent, I had to scale the size of the tiles down to 8x8 pixels. That is, what they used in their paper. I experienced the same problem with the level generation. The fasted levels where generated in about two soconds, the slowest one took over two minutes. They already stated in their paper that they used an A3C agent for their sokoban task as that solves at least the problem of syncronized level generation. I think, the level generation algorithm is ok as it is, an A2C algorithm is just not suitable for this task. I still implemented a solution where an A2C could be used. I started 16 processes for playing 16 different sokoban games to generate training data and started an additional 16 processes that generated sokoban levels and stored them in a multiprocessing buffer. This buffer served as an asynchronous source for new levels and the number of generating processes was just enough to satisfy the need of new levels.

The problem in general is the lack of computing power. In order for this environment to be learned by a model-free actor-critic network or even the I2A you need at least 32 parallel agents/environments otherwise your network will just overfit as a result of very sparse rewards. For the reasons I stated above, I could not use this sokoban environment for my master thesis as it was just too computationally expensive. I developed a different environment that is computationally very light weight and still offers sparse rewards. But just for comparisen: I train the I2A agent with 1e6 training epochs, which takes around 20 days. I hope, the network converges faster so that i can stop the process earlier. In the paper you can see a significant increase of the learning curve after about 5e8 epochs. They train their network for 1e9 epochs. That is not possible unless you have a datacenter.

So my tip if you want to try the I2A approach with sokoban: build a working A3C, and train it on a system that has at least 28 to 32 cores, otherwise your netowork will overfit. You need asynchronuos level generation otherwise the training takes several months. And get a snickers because the training will still take very long...

mpSchrader commented 5 years ago

Hi @Olloxan,

thanks for your insights. Regarding the pixel size did you scaled it down or did you used the tiny_world rendering modes? Do you have some results to share? If so you could add subpage with the current high score. ;-)

This spring I had the chance to attend a lecture by a Deep Mind employee. After the lecture we talked with him about the implementation and the training process of I2A. During the conversation the guest lecturer, who was not part of the I2A team, said that we as students probably won't have the computing resources to train that architecture in a responsable time. Just fyi ;-)

Olloxan commented 5 years ago

In order to Use the proposed network structure and make use of the kernels I had to use the pixel version. I just scaled your 16x16 images down to 8x8, that was no problem. I use a 16 core and a 28 core system. That is not bad for the beginning. And I got pretty good results on the maodel-free actor critics. Unfortunately I could not use your sokoban environmet as I said, because I didnt implement and A3C. But I will post some results when I finished my master thesis.

mpSchrader commented 5 years ago

Awesome! I am looking forward to reading your thesis.

wrongbattery commented 5 years ago

actually, i am trying to implement I2A with your env. However, your env fails many times "Runtime Error/Warning: Generated Model with score == 0. Retry". So have you calculated successive generating rate for your env yet?

mpSchrader commented 5 years ago

Hey, Thanks for that input. Could you open a new ticket for the issue of failing room generation? I already got an idea on how to fix this issue. By the way which environments are you using? Best, Max


From: wrongbattery notifications@github.com Sent: Wednesday, November 21, 2018 7:17:28 AM To: mpSchrader/gym-sokoban Cc: Max Schrader; Comment Subject: Re: [mpSchrader/gym-sokoban] Success Rate? (#11)

actually, i am trying to implement I2A with your env. However, your env fails many times "Runtime Error/Warning: Generated Model with score == 0. Retry". So have you calculated successive generating rate for your env yet?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/mpSchrader/gym-sokoban/issues/11#issuecomment-440547883, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AI9THm_8Hj4vYSWkx_5DdUG0c0q89MTVks5uxO_4gaJpZM4VvRFQ.

Olloxan commented 5 years ago

Awesome! I am looking forward to reading your thesis.

Hi, I wrote you an email, just so you are not wondering, where it might came from^^ Best regards

yangzhao-666 commented 3 years ago

Awesome! I am looking forward to reading your thesis.

Hi, I wrote you an email, just so you are not wondering, where it might came from^^ Best regards

Thanks for @mpSchrader amazing works.

Also really appreciate for @Olloxan 's hints. It helped me a lot to understand. I'm also trying to implement I2A on Sokoban as the start of my PhD works. I was wondering that did you get any results? Does it show the similar results shown in the original paper?

Looking forward to your reply and have a nice day.

Best regards