microsoft / IBAC-SNI

Code to reproduce the NeurIPS 2019 paper "Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck" by Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin and Katja Hofmann.
https://arxiv.org/abs/1910.12911
Other
52 stars 17 forks source link

How to draw figure for coinrun #6

Closed KaiyangZhou closed 4 years ago

KaiyangZhou commented 4 years ago

Hello,

I'm wondering how to draw a figure like Fig.3 in the paper with your code (without re-writing the enjoy.py script)? Or does the original code already support this but I somehow missed, could you please point out?

Thanks

maximilianigl commented 4 years ago

This should be possible with the plots.py script in the coinrun folder. You'll have to adapt the path variable to point to the folder in which all the results for the different runs are saved. Then you can use the experiments dictionary to specify which runs (specified by their run-id) correspond to which algorithm, i.e. each entry in the experiments dictionary will result in one plotted line with the key used in the legend. The values and std for each line are the result of averaging over all the run-ids specified in the list, which is the value for the corresponding key in experiments (hopefully that becomes clearer from the examples in plots.py).

KaiyangZhou commented 4 years ago

Thanks! Your explanation on plots.py is very clear.

I have another question. Does fig.3(left) in the paper record the performance on train or test (unseen) environments? From this line https://github.com/microsoft/IBAC-SNI/blob/master/coinrun/coinrun/ppo2.py#L395 it seems rew_mean obtained from here https://github.com/microsoft/IBAC-SNI/blob/master/coinrun/plots.py#L194 refers to the performance on train environment?

I'm a bit confused now. I thought we have to run enjoy.py by loading the model saved at every 10M timesteps to get the test performance curve on unseen environments. Could you clarify this?

maximilianigl commented 4 years ago

Ah, good question. The code is based in the openai baselines implementation, which is somewhat different from many other frameworks. By using RCALL_NUM_GPU=4 mpiexec -n 4 python3 -m coinrun.train_agent ... to start the experiments (see the README.MD), we're actually starting 4 different processes on 4 different GPUs. Process 0,2,3 are using the training environments and are updating the policy parameters, process 1 is running the test environments. That means, that rew_mean is the training performance if it's extracted from a file with the ending _0 (meaning coming from process 0) and it's the test performance if it's extracted from a file ending with _1.

As you might have noticed, in plot.py, the specified run-ids actually end with _{} (which I forgot to mention before). That is because in this line I'm replacing the {} with either 0 or 1, depending on whether I want the test or training performance.

enjoy.py is not used at all, instead, we are indeed evaluating the test performance concurrently with training, saving it in files ending on _1.

KaiyangZhou commented 4 years ago

Cool. Now I'm clear.

In my case, I'm using 1 gpu per job so I have to run enjoy.py for every saved checkpoint in order to get the test performance (am I doing wrong?).

One more thing, what values do you use/suggest for -num-eval N -rep K?

P.s. just recalled that we had a conversation back in https://github.com/openai/coinrun/issues/7 (I was thinking why your account looks familiar)

maximilianigl commented 4 years ago

I haven't really worked with enjoy.py so I'm not sure. I'd just try out a few values and see what the variance is when you run it multiple times.

Just to warn you, I'm not sure the results will be the same when you only run on one GPU with the same hyperparameters. Usually, the gradients from the 3 training processes are averaged, effectively tripling the batch size. On the other hand, just tripling the batch size on one GPU might be infeasible due to memory constraints (although I haven't tried that).

Btw, if you're looking for an implementation of IBAC-SNI on the whole ProcGen suite, see here. The new ProcGen suite also implements "easy" versions of the environments which might be more feasible to run with just one GPU.

KaiyangZhou commented 4 years ago

Got it. Thanks again!

KaiyangZhou commented 4 years ago

plots.py is great and I've successfully used it for producing fig.3(left) in the paper

just wondering if you have code to produce fig.3(middle) for the generalization gap?

another question: when drawing the score curve (for RL tasks), is using moving average to smooth out the scores a common practice in visualization? (I don't read many papers in RL so I'm curious about this)

maximilianigl commented 4 years ago

Great, glad it worked! I've looked, but I don't think I have the code anymore, not sure what happened to it, maybe it got deleted when I cleaned up the repo for publication. However, it should be fairly straightforward to re-implement based on plots.py as it already reads in the required data and one only needs to subtract test from train performance. Unfortunately, I don't have the time at the moment, but if you decide to implement it, please consider submitting a PR, it would be great to have the functionality in the codebase.

KaiyangZhou commented 4 years ago

thanks!

KaiyangZhou commented 4 years ago

@maximilianigl how about the moving average issue?

maximilianigl commented 4 years ago

Ah, sorry about that, completely overlooked that question. Yes, I think so. RL results are typically quite noisy (stochastic policy, potentially stochastic environment) and evaluation is costly (running entire episodes), so smoothing is a way to keep the plots readable. The std shown in RL plots is usually over random seeds, as there is also a lot of variation.

KaiyangZhou commented 3 years ago

@maximilianigl Hi, just wanna thank you for your help in the code.

My work has been accepted to ICLR'21. The RL code is based on yours :).

https://github.com/KaiyangZhou/mixstyle-release/tree/master/rl.

maximilianigl commented 3 years ago

Congratulations! Just had a look and it's a cool idea! Looking forward to reading the paper.