microsoft / IBAC-SNI

Code to reproduce the NeurIPS 2019 paper "Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck" by Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin and Katja Hofmann.
https://arxiv.org/abs/1910.12911
Other
52 stars 17 forks source link

Not really an issue, more a question #10

Open ghost opened 3 years ago

ghost commented 3 years ago

Hi, I want to reuse your experiment on MiniGrid as a benchmark to my paper on RL generalisation ... it fits nicely, but I am not clear how to replicate the experiment to generate the orange line on your paper, can you provide some insight ? Are your running the training on 2 000 000 environments to generate the chart ? Thanks a lot in advance.

ghost commented 3 years ago

Just to be more precise, I would like to train your agent on 1000 random environment and test it on 1000 other environment to get the generalisation percentage on these test environment ... not sure how I can do that with the code provided ... thanks

maximilianigl commented 3 years ago

Hi, thanks for your interest! We only have an explicit train/test split for the Coinrun environment. For MiniGrid, we randomly sample from all possible layouts during training. This doesn't allow us to explicitly measure the generalisation gap, but the performance of the agent (and their learning speed) still correlates with how well they generalise as the number of possible layouts is so large that they rarely see the same layout twice. So Figure 2 just shows the normal training performance we usually report in RL. Note that there's a lot of variation in the results, which is why ours are averaged over 30 random seeds.

ghost commented 3 years ago

Sure, so if I understood it well, you make iterations where you train on 3 environments randomly chosen and then test on another one also randomly chosen ? right ? the results in computed every 30 test as an average of reward over these 30 test environments ...

maximilianigl commented 3 years ago

For MiniGrid we're using the usual PPO setup (see here for hyperparameters:

Not sure if that helps, please let me know if not - I feel like we might be talking past each other :).